| Literature DB >> 31548623 |
Matthew Mazzella1, Susan J Sumner2, Shangzhi Gao3, Li Su3, Nancy Diao3, Golam Mostofa4, Qazi Qamruzzaman4, Wimal Pathmasiri2, David C Christiani3, Timothy Fennell5, Chris Gennings1.
Abstract
With advances in technologies that facilitate metabolome-wide analyses, the incorporation of metabolomics in the pursuit of biomarkers of exposure and effect is rapidly evolving in population health studies. However, many analytic approaches are limited in their capacity to address high-dimensional metabolomics data within an epidemiologic framework, including the highly collinear nature of the metabolites and consideration of confounding variables. In this Children's Health Exposure Analysis Resource (CHEAR) network study, we showcase various analytic approaches that are established as well as novel in the field of metabolomics, including univariate single metabolite models, least absolute shrinkage and selection operator (LASSO), random forest, weighted quantile sum (WQSRS) regression, exploratory factor analysis (EFA), and latent class analysis (LCA). Here, in a Bangladeshi birth cohort (n = 199), we illustrate research questions that can be addressed by each analytic method in the assessment of associations between cord blood metabolites (1H NMR measurements) and birth anthropometric measurements (birth weight and head circumference).Entities:
Keywords: CHEAR; Collinearity; Dimension reduction; Feature selection; NMR; Targeted metabolomics
Mesh:
Substances:
Year: 2019 PMID: 31548623 PMCID: PMC8041023 DOI: 10.1038/s41370-019-0162-1
Source DB: PubMed Journal: J Expo Sci Environ Epidemiol ISSN: 1559-0631 Impact factor: 5.563
Analytic Strategies for Evaluating Metabolomics Data in Epidemiologic Studies*
| Question | Methods | Challenges |
|---|---|---|
| Which analytes are best for development of biomarkers of effect or exposure? | Shrinkage methods (e.g., LASSO, elastic net) Semi-Bayesian shrinkage methods [ Tree-based methods (e.g., Random Forest) Environment-wide association study (EWAS) [ | Ability to address confounding among co-analytes. Discerning individual effects among highly collinear analytes Detection of relevant analytes given stringent multiple comparison adjustments [EWAS] |
| What are the interactions between analytes? | Generalized linear models (GLMs)with product interaction terms | Sufficient statistical power to detect interaction Interpretability of effect size estimates Degree of interaction to estimate (i.e., 2-way or higher order) [GLMs] |
| Is there a mixture effect (i.e., a cumulative pattern of association)? | Toxic equivalency (TEQ) summary measures [ Weighted quantile sum (WQS) regression [ | Verifying assumption of additivity between individual components [WQS/TEQ] Availability of information of toxicity to create biologically weighted summary measures [TEQ] |
| Can the metabolome be summarized via dimensionality-reduction techniques? | Exploratory Factor Analysis (EFA) Principal components Analysis (PCA) Partial least squares discriminant analysis (PLS-DA) | Interpretation of factors/components from variable loadings complicated as dimensionality of datasets increase Consideration of covariates in extraction of factors/components/phenotypes |
| Are there susceptible subgroups within the population? | Latent Class/Profile Analyses (LCA/LPA) | Consideration of covariates in extraction of profiles |
| Does the metabolome predict disease status or health phenotype? | Artificial Neural Networks / Deep Learning Machine Learning (e.g., LASSO, support vector machine (SVM), Random Forest, gradient boosting) | Inference with respect to individual metabolites given “black box” nature of machine/deep learning methods |
| What metabolic pathways are affected in the observed association? | Pathway Enrichment Analysis following network based (WGCNA) or clustering-based (k-means clustering) summaries of metabolomics data | Existing experimental data that inform curated datasets (e.g., KEGG) used to infer functional pathways may be incomplete representation of biologic pathways Potential biased representation in the annotation of pathways |
adapted from [7]
Demographic characteristics of the study population (n=199)
| Variables | Mean (min,max) |
|---|---|
| Birth weight (z-score) | 2.9 (1.7, 3.5) |
| Birth length | 45.7 (33.0, 64.0) |
| Head circumference | 32.5 (28.0, 36.0) |
| Gestational age (weeks) | 38.3 (33.0, 41.0) |
| Maternal age (years) | 23.1 (18.0, 35.0) |
| Maternal BMI (kg/m2) | 20.5 (15.0, 33.3) |
| Infant Gender | |
| Female (1) | 92 (46.2) |
| Male (0) | 107 (53.8) |
| Maternal Education | |
| 0 | 38 (19.0) |
| 1 | 60 (30.2) |
| 2 | 101 (50.8) |
| Parity | |
| 0 | 88 (44.3) |
| 1 | 111 (55.7) |
Figure 1.Distribution of metabolite levels (umol/L).
Lactic acid and Isopropyl alcohol were among the most and least abundant metabolites detected, respectively.
Figure 2.LASSO-derived associations for (A) birth weight and (B) head circumference.
Estimated association and 95% Confidence Interval (95% CI) (X-axis) between 1-Standard Deviation (SD) change of each metabolite (Y-axis) with birthweight (panel A) and head circumference (panel B).
Figure 3.Random forest variable importance plots for (A) birth weight and (B) head circumference datasets.
Points in blue and red represent metabolites associated with birth outcomes at p<0.10 and p<0.05 thresholds, respectively.
Figure 4.LOESS of association between WQS (training and validation) across (A) birth weight (p=0.018) and (B) head circumference.
Figure 5.Generalized Linear Models associations between EFA-derived metabolite factors and (A) birth weight and (B) head circumference.
Figure 6.Item response likelihood for latent metabolomics classes.
Plots show the likelihood that subjects assigned to varying latent classes (blue, green, red lines) would exhibit concentrations of a given metabolite in the lowest quintile (top plot), third quintile (middle plot), or highest quintile (bottom plot).