| Literature DB >> 31921293 |
Anastasia M Lucas1, Nicole E Palmiero2, John McGuigan2, Kristin Passero2,3, Jiayan Zhou2, Deven Orie2, Marylyn D Ritchie1, Molly A Hall2,3.
Abstract
While genome-wide association studies are an established method of identifying genetic variants associated with disease, environment-wide association studies (EWAS) highlight the contribution of nongenetic components to complex phenotypes. However, the lack of high-throughput quality control (QC) pipelines for EWAS data lends itself to analysis plans where the data are cleaned after a first-pass analysis, which can lead to bias, or are cleaned manually, which is arduous and susceptible to user error. We offer a novel software, CLeaning to Analysis: Reproducibility-based Interface for Traits and Exposures (CLARITE), as a tool to efficiently clean environmental data, perform regression analysis, and visualize results on a single platform through user-guided automation. It exists as both an R package and a Python package. Though CLARITE focuses on EWAS, it is intended to also improve the QC process for phenotypes and clinical lab measures for a variety of downstream analyses, including phenome-wide association studies and gene-environment interaction studies. With the goal of demonstrating the utility of CLARITE, we performed a novel EWAS in the National Health and Nutrition Examination Survey (NHANES) (N overall Discovery=9063, N overall Replication=9874) for body mass index (BMI) and over 300 environment variables post-QC, adjusting for sex, age, race, socioeconomic status, and survey year. The analysis used survey weights along with cluster and strata information in order to account for the complex survey design. Sixteen BMI results replicated at a Bonferroni corrected p < 0.05. The top replicating results were serum levels of g-tocopherol (vitamin E) (Discovery Bonferroni p: 8.67x10-12, Replication Bonferroni p: 2.70x10-9) and iron (Discovery Bonferroni p: 1.09x10-8, Replication Bonferroni p: 1.73x10-10). Results of this EWAS are important to consider for metabolic trait analysis, as BMI is tightly associated with these phenotypes. As such, exposures predictive of BMI may be useful for covariate and/or interaction assessment of metabolic-related traits. CLARITE allows improved data quality for EWAS, gene-environment interactions, and phenome-wide association studies by establishing a high-throughput quality control infrastructure. Thus, CLARITE is recommended for studying the environmental factors underlying complex disease.Entities:
Keywords: body mass index; complex traits; exposome; metabolic disease; quality control
Year: 2019 PMID: 31921293 PMCID: PMC6930237 DOI: 10.3389/fgene.2019.01240
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Flowchart depicting a typical workflow when using the CLARITE package. The user starts with raw data and alternates between summary steps (dashed lines) and filtering/quality control (QC) steps (solid lines) based on variable type (indicated by color) and either user-defined or default thresholds informed by the summary output. Once data are sufficiently cleaned, environment-wide association studies (EWAS) can be run.
Environmental variables and sample sizes for environment-wide association studies (EWAS) in Discovery and Replication datasets.
| Dataset | Binary | Categorical | Continuous | Overall Sample Size |
|---|---|---|---|---|
| Discovery | 60 | 4 | 312 | 9063 |
| Replication | 71 | 5 | 343 | 9874 |
| Shared | 48 | 4 | 280 | n/a |
Number of variables remaining and overall sample size after quality control (QC) included for EWAS in the Discovery and Replication datasets.
Replicating results reaching Bonferroni significance. The first column shows a list of 16 exposures that are Bonferroni significant at the 0.05 level. The fourth and fifth columns are the raw p-values from the original datasets. Columns 6-17 are the Bonferroni corrected p-values from the Discovery and Replication datasets after each adjustment was performed.
| Variable | Category | Description | p-val_Dis | p-val_Rep | Bonferroni corrected p-value w/ T2D adjustment (Discovery) | Bonferroni corrected p-value w/ T2D adjustment (Replication) | Bonferroni corrected p-value w/ CAD adjustment (Discovery) | Bonferroni corrected p-value w/ CAD adjustment (Replication) | Bonferroni corrected p-value w/ HDL adjustment (Discovery) | Bonferroni corrected p-value w/ HDL adjustment (Replication) | Bonferroni corrected p-value w/ LDL adjustment (Discovery) | Bonferroni corrected p-value w/ LDL adjustment (Replication) | Bonferroni corrected p-value w/ TC adjustment (Discovery) | Bonferroni corrected p-value w/ TC adjustment (Replication) | Bonferroni corrected p-value w/ TG adjustment (Discovery) | Bonferroni corrected p-value w/ TG adjustment (Replication) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| LBXGTC | nutrients | g-Tocopherol (ug/dL) | 2.611e-14 | 2.729e-11 | 1.147e-10 | 9.566e-09 | 4.286e-11 | 1.031e-08 | 1.908e-10 | 1.246e-07 | 8.675e-07 | 2.446e-08 | 2.552e-11 | 5.372e-09 | 2.911e-08 | 6.823e-10 |
| LBXIRN | nutrients | Iron (ug/dL) | 3.283e-11 | 1.748e-12 | 1.759e-08 | 4.527e-10 | 2.185e-08 | 5.718e-10 | 7.207e-06 | 7.294e-08 | 5.367e-04 | 3.289e-07 | 2.178e-08 | 3.543e-10 | 5.423e-08 | 1.001e-09 |
| URXUPT | heavy metals | Platinum, urine (ug/L) | 1.281e-10 | 2.322e-04 | 1.307e-06 | 7.970e-02 | 4.795e-07 | 2.207e-02 | 1 | NA | 1.394e-03 | 2.342e-04 | 1.792e-07 | 2.462e-02 | 1 | NA |
| total_days_drink_year | alcohol use | days drink in year | 4.563e-07 | 1.710e-10 | 4.298e-04 | 3.352e-08 | 2.426e-04 | 4.133e-08 | 1 | NA | 1 | 2.329e-06 | 2.409e-04 | 2.890e-08 | 1.973e-04 | 1.375e-07 |
| LBXBEC | nutrients | trans-Beta carotene (ug/dL) | 8.394e-07 | 1.690e-08 | 1.996e-03 | 2.607e-06 | 1.478e-03 | 2.657e-06 | 6.841e-03 | 1.016e-05 | 1.719e-01 | 9.840e-04 | 1.785e-03 | 2.363e-06 | 2.594e-03 | 4.075e-06 |
| cigarette_smoking | smoking behavior | Current Cigarette Smoker? | 8.735e-07 | 8.627e-05 | 4.984e-04 | 1.939e-02 | 5.460e-04 | 1.446e-02 | 8.060e-04 | 9.956e-04 | 7.293e-01 | 1 | 3.938e-04 | 1.134e-02 | 7.821e-04 | 1.073e-02 |
| LBXCBC | nutrients | cis-Beta carotene (ug/dL) | 9.142e-07 | 1.159e-09 | 2.233e-03 | 2.239e-07 | 1.487e-03 | 2.177e-07 | 7.286e-03 | 9.552e-07 | 9.814e-02 | 2.328e-04 | 1.963e-03 | 1.835e-07 | 2.936e-03 | 2.984e-07 |
| LBXCOT | cotinine | Cotinine (ng/mL) | 3.989e-06 | 9.673e-07 | 8.159e-04 | 5.675e-04 | 1.944e-03 | 8.101e-05 | 3.626e-05 | 1.216e-05 | 8.630e-02 | 1.205e-01 | 1.687e-03 | 1.264e-04 | 4.326e-04 | 4.587e-05 |
| LBXLUZ | nutrients | Lutein and zeaxanthin (ug/dL) | 8.463e-06 | 2.312e-11 | 1.356e-02 | 2.214e-09 | 1.404e-02 | 7.595e-09 | 7.692e-02 | 9.480e-06 | 4.178e-02 | 2.035e-09 | 1.292e-02 | 4.697e-09 | 8.614e-03 | 2.491e-09 |
| SMQ040 | smoking behavior | Do you now smoke cigarettes... | 1.509e-05 | 3.521e-05 | 4.154e-03 | 6.608e-03 | 7.607e-03 | 2.047e-03 | 1.409e-03 | 5.894e-04 | 1 | 9.251e-01 | 7.106e-03 | 4.389e-03 | 3.412e-03 | 1.908e-03 |
| LISINOPRIL | pharmaceutical | LISINOPRIL | 2.919e-05 | 1.665e-08 | 5.157e-01 | 7.621e-03 | 6.523e-03 | 2.183e-06 | 1.958e-01 | 7.836e-04 | 1.209e-02 | 5.763e-02 | 1.744e-02 | 2.942e-06 | 6.626e-02 | 3.901e-05 |
| LBXBPB | heavy metals | Lead (ug/dL) | 3.858e-05 | 1.106e-07 | 1.686e-02 | 1.037e-04 | 1.760e-02 | 1.321e-05 | 1.411e-02 | 3.311e-04 | 2.305e-01 | 1.160e-04 | 1.538e-02 | 1.527e-05 | 1.811e-02 | 1.611e-05 |
| LBXVID | nutrients | Vitamin D (ng/mL) | 4.627e-05 | 3.048e-12 | 6.081e-02 | 4.580e-09 | 4.478e-02 | 2.545e-10 | 2.821e-01 | 2.356e-07 | 1 | 1.251e-06 | 4.654e-02 | 7.294e-10 | 6.661e-02 | 2.987e-09 |
| DR1TALCO | food component recall | Alcohol (gm) | 8.099e-05 | 9.210e-06 | 5.669e-02 | 2.874e-03 | 1.474e-02 | 5.996e-04 | 1 | NA | 1 | NA | 3.821e-02 | 9.168e-04 | 3.996e-02 | 1.217e-03 |
| LBXBCD | heavy metals | Cadmium (ug/L) | 1.087e-04 | 6.826e-07 | 8.970e-02 | 3.771e-04 | 5.373e-02 | 1.230e-04 | 8.827e-02 | 3.398e-05 | 6.130e-02 | 9.388e-02 | 3.916e-02 | 7.718e-05 | 3.989e-02 | 4.872e-05 |
| DR1TP204 | food component recall | PFA 20:4 (Eicosatetraenoic) (gm) | 1.329e-04 | 1.573e-05 | 5.840e-02 | 2.585e-03 | 4.481e-02 | 1.912e-03 | 1.079e-03 | 9.831e-04 | 1.998e-01 | 5.728e-02 | 4.506e-02 | 2.301e-03 | 4.605e-02 | 5.319e-03 |
Overview of number of results in Discovery, Replication, and both datasets at varying significance thresholds.
| Dataset | Tests | FDR 0.1 | Bonf 0.05 | Bonf. 0.01 |
|---|---|---|---|---|
| Discovery | 332 | 99 | 18 | 11 |
| Replication | 99 | 62 | 29 | 25 |
| Both | NA | 62 | 16 | 10 |
Number of tests performed in the discovery and replication analyses, number of results passing false discovery rate (FDR) < 0.1, and number of results passing Bonferroni p-value threshold (alphas 0.05 and 0.01).
Figure 2Environment-wide association studies (EWAS) results for body mass index (BMI) in Discovery and Replication datasets using CLARITE. Manhattan plot displays exposure categories along the x- axis with -log10(p-value) along the y-axis, results included for Discovery (circle) and Replication (triangle) datasets. The red line denotes the Bonferroni threshold (alpha: 0.05) for the number of tests run in the Discovery dataset (305), and the blue line denotes the Bonferroni threshold (alpha: 0.05) for the number of tests run in the Replication dataset (99). The 16 replicating results with Bonferroni-corrected p-value < 0.05 are labeled.