| Literature DB >> 29373992 |
Nicholas J W Rattray1, Nicole C Deziel1, Joshua D Wallach2,3, Sajid A Khan4,5, Vasilis Vasiliou1,5, John P A Ioannidis6,7,8,9,10, Caroline H Johnson11,12.
Abstract
BACKGROUND: Over the past 20 years, advances in genomic technology have enabled unparalleled access to the information contained within the human genome. However, the multiple genetic variants associated with various diseases typically account for only a small fraction of the disease risk. This may be due to the multifactorial nature of disease mechanisms, the strong impact of the environment, and the complexity of gene-environment interactions. Metabolomics is the quantification of small molecules produced by metabolic processes within a biological sample. Metabolomics datasets contain a wealth of information that reflect the disease state and are consequent to both genetic variation and environment. Thus, metabolomics is being widely adopted for epidemiologic research to identify disease risk traits. In this review, we discuss the evolution and challenges of metabolomics in epidemiologic research, particularly for assessing environmental exposures and providing insights into gene-environment interactions, and mechanism of biological impact. MAIN TEXT: Metabolomics can be used to measure the complex global modulating effect that an exposure event has on an individual phenotype. Combining information derived from all levels of protein synthesis and subsequent enzymatic action on metabolite production can reveal the individual exposotype. We discuss some of the methodological and statistical challenges in dealing with this type of high-dimensional data, such as the impact of study design, analytical biases, and biological variance. We show examples of disease risk inference from metabolic traits using metabolome-wide association studies. We also evaluate how these studies may drive precision medicine approaches, and pharmacogenomics, which have up to now been inefficient. Finally, we discuss how to promote transparency and open science to improve reproducibility and credibility in metabolomics.Entities:
Keywords: Chemometrics; Exposome; Exposotype; Genetic epidemiology; Genomics; Metabolomics
Mesh:
Year: 2018 PMID: 29373992 PMCID: PMC5787293 DOI: 10.1186/s40246-018-0134-x
Source DB: PubMed Journal: Hum Genomics ISSN: 1473-9542 Impact factor: 4.639
Fig. 1a Environmental health paradigm. b Exposure and the central dogma of molecular biology
Mass spectrometry metabolite databases for identification of environmental exposures
| Database name | Description | URL |
|---|---|---|
| Human metabolome database (HMDB) | 114,113 xenobiotic and endogenous metabolites with chemical, biochemical, and clinical information. | |
| Toxic exposome database (T3DB) | 3767 toxic compounds, targets and gene expression data, part of the HMDB suite. | |
| METLIN | 961,829 xenobiotic and endogenous metabolites with chemical information. Contains information from DSSTox. | |
| Exposome-Explorer | 692 dietary and pollutant biomarkers, with concentration values measured from biospecimens with intra class correlation coefficients. | |
| Madison-Qingdao Metabolomics Consortium Database | 20,300 xenobiotics and endogenous metabolites, with chemical information | |
| Drugbank | 10,513 drug entries with drug target information, part of the HMDB suite | |
| PubChem | 93,977,784 compounds, xenobiotic and endogenous metabolites but also peptides, and chemically altered macromolecules. Data is derived from hundreds of sources. | |
| CompTox Chemistry Dashboard | 758,000 xenobiotics with chemical information compiled from multiple sources; PubChem, and US EPA’s DSSTox, ACToR, ToxCast, EDSP21, and CPCat. |
Common statistical methods and tests used in epidemiology, genetics, and metabolomics, with reference link to descriptive articles on appropriate general use
| Class of test | Type of test | Application/description | Refs |
|---|---|---|---|
| Descriptive | Mean | The simplest of tests used to describe basic features within data. | Covered in all general statistical textbooks and used in most if not all scientific disciplines. |
| Range, variance, SD | Describe spreads of data within a population | ||
| Inferential | Predicts/infers an observed mean, frequency, or proportion to a predetermined value, respectively. | ||
| ANOVA | Parametric method that tests the hypothesis that the means of two or more populations are equal. Frequently used to compare variance among groups relative to variance within groups | ||
| Kruskal-Wallis | Non-parametric method to rank statistical significant differences between two or more groups of an independent variable on a continuous/ordinal variable | ||
| Scaling | Centering, auto, pareto, log, MD | Data pretreatment methods aim at reducing biological and analytical bias | [ |
| Principal component | PCA | Unsupervised dimensional reduction procedure used to explain the maximum variance within complex datasets. | [ |
| Multiblock PCA | PCA extension designed to find the underlying relationships between sets of related data | [ | |
| ANOVA-PCA | Uses PC dimensional reduction to determines the effect of the experimental factors on multiple dependent variables | [ | |
| PC-DFA | Supervised test that summarizes the differentiation between groups while overlooking within-group variation. | [ | |
| Regression | Linear | Summarizes and quantifies the relationship between two continuous variables | [ |
| PLS | Used to predict a set of dependent variables from a large set of independent variables | [ | |
| O-PLS | orthogonal signal correction on PLS that maximizes the explained covariance on the first latent variable | [ | |
| PLS-R | Combination of the predictive power of regression alongside the ability to deal with high dimensionality and multicollinearity of variables. | [ | |
| PLS-DA | Supervised approach to prediction on discrete variables | [ | |
| LASSO | Parsimonious approach to variable selection and regularization in order to enhance interpretability and reduce noise | [ | |
| Elastic net | Variable reduction approach where strongly correlated predictors coalesce in or out of the model together | [ |
Definitions: SD standard deviation, MD median, PCA principal component analysis, ANOVA analysis of variance, PC-DFA principal component discriminant function analysis, PLS partial least squares (also known as projection of latent structures), O-PLS orthogonal PLS, PLS-R PLS regression, LASSO least absolute shrinkage and selection operator
Fig. 2The biological and analytical aspects of bias and variance that can lead to a tendency towards erroneous results in both untargeted and targeted metabolomics