| Literature DB >> 34154633 |
Jisu Shin1,2, Sang Hong Lee3,4.
Abstract
Genetic variation in response to the environment, that is, genotype-by-environment interaction (GxE), is fundamental in the biology of complex traits and diseases. However, existing methods are computationally demanding and infeasible to handle biobank-scale data. Here, we introduce GxEsum, a method for estimating the phenotypic variance explained by genome-wide GxE based on GWAS summary statistics. Through comprehensive simulations and analysis of UK Biobank with 288,837 individuals, we show that GxEsum can handle a large-scale biobank dataset with controlled type I error rates and unbiased GxE estimates, and its computational efficiency can be hundreds of times higher than existing GxE methods.Entities:
Keywords: Biobank-scale data; GxE interaction; LDSC; Reaction norm model; Whole-genome approach
Mesh:
Year: 2021 PMID: 34154633 PMCID: PMC8218431 DOI: 10.1186/s13059-021-02403-1
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Type I error rates of GxEsum to detect GxE at a significance threshold of p-value < 0.05
| Scenarios | Type l error rate |
|---|---|
| Var(GxE) = 0, var(RxE) = 0 | 0.066 |
| Var(GxE) = 0, var(RxE) = 0, G-E correlation = 0.1 | 0.064 |
| Var(GxE) = 0, var(RxE) = 0, R-E correlation = 0.1 | 0.044 |
| Var(GxE) = 0, var(RxE) = 0, G-E correlation = 0.1, R-E correlation = 0.1 | 0.056 |
| Var(GxE) = 0, var(RxE) = 0.1 | 0.044 |
| Var(GxE) = 0, var(RxE) = 0.1, G-E correlation = 0.1 | 0.034 |
| Var(GxE) = 0, var(RxE) = 0.1, R-E correlation = 0.1 | 0.028 |
| Var(GxE) = 0, var(RxE) = 0.1, G-E correlation = 0.1, R-E correlation = 0.1 | 0.054 |
| Average |
We simulated phenotypic data based on a real genotypic dataset (ARIC GWAS) including 7263 participants with 583,085 SNPs, using various scenarios. The phenotypes were standardized such that the phenotypic mean was 0 and the phenotypic variance was 1. Type I error rate (i.e., false-positive) was estimated from 500 replicates for each scenario
GxE genotype-by-environment interaction, RxE residual-by-environment interaction, G-E correlation genotype-environment correlation, R-E correlation residual-environment correlation
Type I error rates of GxEsum when using binary disease traits with various population prevalence
| Scenarios | Population prevalence (k) | Type I error rate |
|---|---|---|
| Var(GxE) = 0, Var(RxE) = 0 | 0.025 | 0.052 |
| 0.05 | 0.042 | |
| 0.1 | 0.076 | |
| 0.5 | 0.054 | |
| Var(GxE) = 0, Var(RxE) = 0.1 (on the liability scale) | 0.025 | 0.044 |
| 0.05 | 0.036 | |
| 0.1 | 0.050 | |
| 0.5 | 0.052 | |
| Average |
We simulated quantitative phenotypic data based on a real genotypic dataset (ARIC GWAS) including 7263 individuals with 583,085 SNPs. The phenotypes were standardised such that the mean was 0 and variance was 1, for which we applied the liability threshold model to generate affected or unaffected disease status for each individual, using various values for the population prevalence (k = 0.025, 0.05, 0.1 or 0.5). Type I error rate at a significance threshold of p-value < 0.05 was estimated from 500 replicates for each scenario and population prevalence
Fig. 1The ratio of standard error (SE) from GxEsum to that from RNM using UK Biobank data. The SEs of GxE variance estimated from GxEsum with various sample sizes ranging from 50,000 to 288,837 were obtained, and they were compared to the SE of GxE variance estimated from RNM with a sample size of 50,000. The dashed horizontal line represents the ratio as 1
Fig. 2Computing time with various sample sizes used in GxEsum and RNM analyses. As the sample size increases, the computing time of RNM (red) increases exponentially, while that of GxEsum (blue) is almost invariant (less than a minute)
Estimates obtained from GxEsum analysis using real data
| Main trait | Environmental variable | Main additive genetic variance ( | GxE interaction variance ( | |
|---|---|---|---|---|
| BMI | Age | 0.216 (0.007) | 0.004 (0.002) | 1.86E−02 |
| NEU | 0.216 (0.007) | 0.007 (0.002) | 1.61E−05 | |
| PA | 0.218 (0.007) | 0.003 (0.001) | 2.57E−02 | |
| ALC | 0.216 (0.007) | 0.003 (0.002) | 5.98E−02 | |
| Hypertension | BMI | 0.152 (0.008) | 0.006 (0.002) | 2.09E−03 |
| WHR | 0.154 (0.008) | 0.005 (0.002) | 3.21E−02 | |
| BFP | 0.151 (0.008) | 0.008 (0.003) | 2.66E−02 | |
| Type 2 diabetes | BMI | 0.141 (0.014) | 0.085 (0.022) | 1.58E−04 |
| Diastolic BP | 0.198 (0.014) | − 0.004 (0.006) | 5.38E−01 | |
| Systolic BP | 0.204 (0.014) | − 0.006 (0.006) | 3.17E−01 |
We used a quantitative trait (BMI) and binary disease traits (hypertension and type 2 diabetes) because BMI is known to be modulated by age/lifestyle such as NEU, ALC, and PA [8, 22, 23], and hypertension and type 2 diabetes are known to be associated with obese traits such as BMI, WHR, and BP [24, 25]. The p-value is from a Wald test for the estimated GxE variance not being different from zero. The estimates on the observed scale for the binary traits were transformed to those on the liability scale using Robertson transformation [17, 26]. All estimates were from the GxEsum model
NEU neuroticism score, PA physical activity, ALC alcohol intake frequency, WHR waist-hip ratio, BFP body fat percentage, BP blood pressure