Literature DB >> 34544119

Variance-component-based meta-analysis of gene-environment interactions for rare variants.

Abstract

Complex diseases are often caused by interplay between genetic and environmental factors. Existing gene-environment interaction (G × E) tests for rare variants largely focus on detecting gene-based G × E effects in a single study; thus, their statistical power is limited by the sample size of the study. Meta-analysis methods that synthesize summary statistics of G × E effects from multiple studies for rare variants are still limited. Based on variance component models, we propose four meta-analysis methods of testing G × E effects for rare variants: HOM-INT-FIX, HET-INT-FIX, HOM-INT-RAN, and HET-INT-RAN. Our methods consider homogeneous or heterogeneous G × E effects across studies and treat the main genetic effect as either fixed or random. Through simulations, we show that the empirical distributions of the four meta-statistics under the null hypothesis align with their expected theoretical distributions. When the interaction effect is homogeneous across studies, HOM-INT-FIX and HOM-INT-RAN have as much statistical power as a pooled analysis conducted on a single interaction test with individual-level data from all studies. When the interaction effect is heterogeneous across studies, HET-INT-FIX and HET-INT-RAN provide higher power than pooled analysis. Our methods are further validated via testing 12 candidate gene-age interactions in blood pressure traits using whole-exome sequencing data from UK Biobank.

Entities: Chemical

Keywords: UK Biobank; blood pressure; gene–environment interaction; meta-analysis; rare variants; variance component method; whole-exome sequencing

Mesh：

Year: 2021 PMID： 34544119 PMCID： PMC8661424 DOI： 10.1093/g3journal/jkab203

Source DB: PubMed Journal: G3 (Bethesda) ISSN： 2160-1836 Impact factor: 3.154

Introduction

With the advancement of high-throughput sequencing technologies (Ansorge 2009), a variety of statistical tests (Li and Leal 2008; Madsen and Browning 2009; Wu ; Lee ; Cheng ; Sun ; Liu ) have been developed to identify associations between rare variants and complex diseases or traits. For most complex diseases or traits, thousands of genetic variants identified by genome-wide association studies (GWASs) explain only a small proportion of their heritabilities, leaving much of those heritabilities still unexplained; this phenomenon is known as “missing heritability” (Tong and Hernandez 2020). Variants identified through GWASs are mostly common variants with minor allele frequencies (MAFs) larger than 5%. Studies have demonstrated that rare variants may explain some of the “missing heritability” (Lee ; Kao ; Yu ). For example, two rare variants have been identified to be associated with low-density lipoprotein cholesterol and high-density lipoprotein cholesterol (Igartua ). Five rare variants have been associated with low systolic blood pressure (SBP) (He ). On the other hand, complex diseases are often caused by a combination of genetic factors, environmental exposures, and the interplay between them; thus, gene–environment interaction (G × E) may explain some of the “missing heritability” of complex diseases (Lim ). Recently, an increasing number of statistical methods have been developed for testing G × E with rare variants. These methods aim to detect individually weak but collectively strong G × E effects in a functional set, typically a gene or a gene region. Tzeng ) proposed a similarity-based regression approach (SIMreg) to test the G × E of rare variants in which trait similarity is regressed on pairwise genetic similarity. The approach is applicable for both continuous and binary traits and allows testing for main genetic effects, interaction effects, and joint effects. Jiao presented a gene-based G × E test (SBERIA) for case-control studies in which gene-environment correlations are used to filter out variants showing no promising G × E effects. Lin ) proposed a G × E set association test (GESAT) under generalized linear models and tested the interaction effect between a marker set and an environmental variable for continuous and discrete traits. GESAT assumes the G × E effect to be random and employs a variance component score test in a linear mixed model framework. Chen ) proposed two G × E tests for rare variants: INT-FIX, which treats main genetic effects as fixed effects, and INT-RAN, which treats main genetic effects as random. Lin proposed the interaction sequence kernel association test (iSKAT) for assessing rare variants by environmental interactions, which is robust to the proportion of causal variants in a gene and the directions of the variants’ interaction effects. Liu ) developed a unified gene-based G × E test, which filters out variants with little evidence of interaction effects. This method was shown to improve the power of interaction analysis by means of an optimal filtering threshold. All these methods are designed for the analysis of interactions in a single study, which is often underpowered due to limited sample size. A very large number of samples are indeed necessary for identifying genetic variants, especially for finding G × Es (Smith and Day 1984; McCarthy ). To overcome this limitation, the sharing of results among consortia on the same disease or trait and meta-analyses that combine the results of multiple studies are routine practices (Panagiotou ). A meta-analysis that combines summary results rather than individual-level raw data from multiple independent studies offers an increased effective sample size and boosted statistical power (Evangelou and Ioannidis 2013; Hu ; Shi and Nehorai 2017; Jin and Shi 2019a, b). However, little work has been done on the meta-analysis of G × Es with rare variants. Wang ) suggested a gene-based meta-analysis with filtering to detect G × E effects in case-control studies. In this approach, a variant-level “meta-filtering” test is first conducted, and meta-analysis techniques are then applied to test gene-based G × E effects on the retained variants. INT-FIX and INT-RAN are gene-based G × E tests for rare variants in a single study, which were proposed by Chen ) and are implemented in the rareGE package. In this study, we develop corresponding meta-analysis methods based on INT-FIX and INT-RAN, which are two powerful and accessible methods for testing G × Es at the study level. Our meta-analysis methods combine gene-level summary statistics from studies via INT-FIX or INT-RAN tests and consider the G × E effect to be either homogeneous or heterogeneous across studies. In simulation studies, we have evaluated our methods by examining their null distributions and statistical power. Our methods have been further applied in testing 12 candidate gene–age interactions in blood pressure (BP) regulation (Simino ) using whole-exome sequencing data from UK Biobank (Zhao ).

Methods

INT-FIX and INT-RAN in individual studies

Suppose that there are studies in a meta-analysis for testing gene-based G × E effects of rare variants in a region. For the k-th study, individuals have been sequenced in the region, which has variants, and and denote the phenotype and genotypes of individual . Under the additive genetic model, , 1, or 2 are the numbers of minor alleles. The first element of the covariate vector is the intercept, with a value of 1, and the other elements are covariates. The environmental factor is included as one of the covariates for adjusting its main effect. Under the assumption of independent observations, consider the following linear mixed model for testing the gene by interaction as follows: where is an error term and represents the effects of the intercept and covariates. and are the main genetic effects and G × E effects, respectively. Assume that has a mean of 0 and a covariance matrix , where denotes the identity matrix. and denote the diagonal weight matrices with elements and (), which are the weights of the main genetic effects and G × E effects, respectively. Let be the phenotype vector, let be an diagonal matrix, and let be the error vector. The linear mixed model is written in matrix form as The null hypothesis with no interaction effect for any variant in the gene is : , and the alternative hypothesis of at least one variant having a nonzero interaction effect is . In the INT-FIX method (Chen ), the main genetic effects are assumed to be fixed effects, and the are the regression coefficients of the main genetic effects. Under the null hypothesis, where . By means of maximum likelihood or restricted maximum likelihood functions, and can be estimated; the corresponding estimates are denoted by and , respectively. Then, the estimated mean and covariance matrix of are and where . The INT-FIX statistics of testing the G × E interaction is where is the G × E score statistic for the j-th variant. The asymptotic distribution of is a mixture of chi-square distributions, and the P-value can be calculated via Kuonen’s saddlepoint method (Kuonen 1999; Chen ). In the INT-RAN method (Chen ), the main genetic effects are assumed to be random, and the elements of follow a normal distribution with zero mean and covariance matrix . Under the null hypothesis, the working model is written as (3), with , and , , and can be numerically estimated by means of maximum likelihood or restricted maximum likelihood functions (Chen ). The estimated mean and covariance matrix of are where and . The INT-RAN test statistic is where is the G × E score statistic for the j-th variant. The statistic asymptotically follows a mixture of chi-square distributions, and the corresponding P-value can again be computed via Kuonen’s saddlepoint method (Kuonen 1999; Chen ).

Meta-analysis of summary statistics from INT-FIX and INT-RAN

The meta-analysis of G × E effects is performed based on summary statistics from each study. The required summary statistics for each study include the MAFs of all variants, the G × E score statistic (5) or (7) for each variant and the regional G × E relationship matrix (Zhang and Lin 2003; Wu ). For the meta-analysis results of INT-FIX, the regional G × E relationship matrix of the k-th study is where and . For INT-RAN, the regional G × E relationship matrix of the k-th study is For convenience, we assume that all variants can be observed in each of the studies, that is, . When some variant is monomorphic in a study, its G × E score statistic and corresponding elements in the relationship matrices can be simply set to zero. We consider two types of meta-analyses, under scenarios in which the G × E effects are homogeneous and heterogeneous across different studies. For the scenario in which the G × E effects are homogeneous across the different studies, we propose the following G × E test statistics for the meta-analysis of summary statistics from INT-FIX and INT-RAN: Similar to the meta-analysis of homogeneous main genetic effects presented in Lee , these two statistics first combine the weighted G × E score statistics across different studies for each variant and then aggregate the squared score statistics of all variants in a gene. Here, the weights are based on the MAFs of the variants estimated across all studies. Under the null hypothesis, asymptotically follows a mixed 1-df chi-square distribution (Lee ; Chen ), namely, , where are the nonzero eigenvalues of and the P-value can be calculated via Kuonen’s saddlepoint method (Kuonen 1999). Similarly, we have the test statistic , where are the nonzero eigenvalues of For the scenario in which the G × E effects are assumed to be heterogeneous across the different studies, we propose the following G × E test statistics for the meta-analysis of the summary statistics from INT-FIX and INT-RAN: As in the meta-analysis of heterogeneous main genetic effects presented in Lee , these test statistics aggregate the squared and weighted score statistics across all studies and all variants. In HET-INT-FIX and HET-INT-RAN, the weights are based on the study-specific MAFs. Here, , where are the nonzero eigenvalues of , and , where are the nonzero eigenvalues of . The P-values of the mixed s can also be calculated via Kuonen’s saddlepoint method (Kuonen 1999). For trans-ethnic meta-analyses, we assume that the G × E effects in studies of the same ancestry are homogeneous, and heterogeneous for studies of different ancestries. Suppose that there are studies from ancestries. In the -th ancestry, there are studies, . Denote as the number of studies from ancestry 1 to , in particular, . We propose the following G × E test statistics for the trans-ethnic meta-analysis of the summary statistics from INT-FIX and INT-RAN: The test statistics combine the weighted score statistic of the -th variant from studies of the same ancestry, then aggregate the squared scores across all ancestries and all variants. Here, the weights are based on ancestry-specific MAFs. When all studies in the meta-analysis are of the same ancestry, Equations (16) and (17) reduce to Equations (10) and (11), respectively. When all studies have different ancestries, Equations (16) and (17) reduce to Equations (14) and (15), respectively.

Data availability

This research has been conducted using the UK Biobank Resource under Application Number 44080. Corresponding R codes for meta-analysis methods of testing G × E effects in this article are available at GitHub: https://github.com/jlyx53/Codes-for-Meta-analyses-of-G-E-effects.

Results

Numerical simulations

We conducted simulation studies to evaluate the null distributions and statistical power of our four meta-statistics for a continuous phenotype. We considered three scenarios as summarized in Table 1. Each has three studies with sample sizes of 1600, 2200, and 3200, referred to study 1, study 2, and study 3, respectively. For scenario 1, all studies are European (EUR) samples and have the same set of covariates; For scenario 2, three studies are all EUR samples but have different set of covariates; For scenario 3, three studies have different set of covariates and have different ancestries: the first two studies are EUR samples and the third study is African-American (AA) samples. In the power analysis, for each scenario, we considered cases with homogeneous and heterogeneous G × E effects across studies, and also five different levels of main and interaction effects.

Table 1

Simulation study settings for three scenarios

Scenario	Pop	Sample sizes			Covariates
Scenario	Pop	Study 1	Study 2	Study 3	Study 1	Study 2	Study 3
1	EUR	1,600	2,200	3,200	(BMI, age, gender)	(BMI, age, gender)	(BMI, age, gender)
2	EUR	1,600	2,200	3,200	(BMI)	(BMI, age)	(BMI, age, gender)
3	EUR+AA	1,600	2,200	3,200	(BMI)	(BMI, age)	(BMI, age, gender)

Pop, the populations of three studies. EUR, refers to all three studies are European samples. EUR+AA, refers to study 1 and study 2 are European samples and study 3 are African-American samples.

Simulation study settings for three scenarios Pop, the populations of three studies. EUR, refers to all three studies are European samples. EUR+AA, refers to study 1 and study 2 are European samples and study 3 are African-American samples. Using the calibrated coalescent model implemented in COSI (Schaffner ), we first simulated 10,000 EUR haplotypes and 10,000 AA haplotypes over a 200 kb region. The simulation parameters were set to mimic the linkage disequilibrium structure, local recombination rate and population history of the EUR and AA populations (Lee ). Then, for scenarios 1 and 2, we randomly paired the EUR haplotypes to obtain diploid genotype data of 10,000 EUR individuals and randomly selected 7000 out of the 10,000 individuals. For scenario 3, we randomly paired the EUR haplotypes to obtain diploid genotype data of 10,000 individuals and randomly selected 3800 out of them. Similarly, we obtained diploid genotype data of 10,000 AA individuals, out of which we randomly selected 3200 individuals. Since the average exon length of a gene is approximately 3 kb (Pruitt ), we randomly selected a 3 kb subregion within the 200 kb region for each replicate of the genotype data and retained variants with MAFs less than 0.03. We repeated this process 1000 times and generated 1000 replicates of genotype data sets for the three scenarios. On average, there were 54 rare variants in the 3 kb region under scenarios 1 and 2 and 87 under scenario 3. For scenarios 1 and 2, out of the genotype data of the 7000 EUR individuals, we randomly selected 1600, 2200, and 3200 for the samples of study 1, study 2, and study 3, respectively. For scenario 3, out of the genotype data of the 3800 EUR individuals, we randomly selected 1600 and 2200 for the samples of study 1 and study 2, respectively. To evaluate the null distributions of the proposed meta-analysis statistics, we generated phenotype data sets under the null model. To reduce the computational burden, we simulated 20 replicates of covariates and phenotype for each of the 1000 genotype sets, which yielded 20,000 genotype-phenotype data sets. For scenario 1, the continuous phenotypes for the -th () study were generated by means of the following linear model: where is a binary covariate vector in which each element is drawn from a Bernoulli distribution with probability 0.5; and are continuous covariate vectors in which each element is normally distributed, and , for ; and represents the random errors, with each element following a standard normal distribution. For scenarios 2 and 3, we generated the phenotypes of study 1 according to Equation (18) by removing the age and gender effects, and generated the phenotypes of study 2 by removing the gender effect. The phenotypes of study 3 were generated with the full covariates. Under the null hypothesis, the genotypes are not associated with the phenotype. To evaluate the statistical power of our proposed meta-analysis methods, we generated phenotypes under the alternative model based on the 1000 genotype data sets described previously. For each of the three scenarios, we considered five different levels of genetic main and interaction effects: level 1 refers to the existence of only genetic main effects but without interaction effects, level 2 refers to the existence of genetic main effects and weak interaction effects, level 3 refers to the existence of both genetic main effects and interaction effects, level 4 refers to the existence of interaction effects and weak genetic main effects, level 5 refers to the existence of only interaction effects but without genetic main effects. For each of the five levels, we assumed that 20, 40, or 60% of the rare variants were causal variants. We simulated one replicate of phenotype and covariates for each genotype data set. The covariates followed the same distributions described previously. For scenario 1, the phenotypes for the -th study were generated by means of the following linear model: where is the genotype matrix of the causal variants in study and has elements . We used body mass index (BMI) as the environmental factor , which was centered to have a mean of 0. represents the effects of gene–BMI interaction, with elements . For scenarios 2 and 3, partial covariates were used as described in Table 1. The values of and for the five levels of genetic main and interaction effects were: level 1 with , level 2 with , level 3 with , level 4 with , level 5 with . For the case of homogeneous variant and interaction effects across studies, under all the three scenarios, we simulated the same variant effects and gene-BMI interaction effects across all studies, that is, and . For the case of heterogeneous variant and interaction effects across studies, under scenarios 1 and 2, we simulated the variant effects and gene-BMI interaction effects for each study independently. Under scenario 3, we simulated the same variant effects and gene-BMI interaction effects for study 1 and study 2 that have the EUR ancestry, and simulated the variant effects and gene-BMI interaction effects for study 3 with AA ancestry separately. In all simulations, the variant weights followed Beta distributions, , as suggested by Wu , where denotes the MAF of variant . For scenarios 1 and 2, was estimated based on all studies in HOM-INT-FIX and HOM-INT-RAN and was specific to each study in HET-INT-FIX and HET-INT-RAN. For scenario 3, was estimated based on all studies in HOM-INT-FIX and HOM-INT-RAN and was estimated based on ancestry-specific studies in HET-INT-FIX and HET-INT-RAN. The gene-BMI interaction was considered significant if its P-value was less than the significance level of , and empirical power was calculated as the proportion of significant results among 1000 replicates.

Null distributions of the meta-statistics

We examined the distributions of our proposed HOM-INT-FIX, HOM-INT-RAN, HET-INT-FIX, and HET-INT-RAN meta-statistics using the data sets simulated under the null hypothesis. We first computed the G × E score statistics of each study according to formulas (5) and (7) and the regional G × E relationship matrix of each study according to formulas (8) and (9). Then, for scenarios 1 and 2, we combined the study-level G × E score statistics using formulas (10)–(11) and (14)–(15) to obtain the four meta-statistics. For scenario 3, we combined the study-level G × E score statistics using formulas (10)–(11) and (16)–(17). We computed the regional G × E relationship matrices for the meta-analysis by combining the regional relationship matrices from each study according to Equations (12) and (13). The P-values of the four meta-statistics were calculated according to their theoretical distributions and Kuonen’s saddlepoint method. The distributions of the empirical P-values under the null hypothesis were compared with the expected uniform distribution between 0 and 1. Quantile–quantile (Q–Q) plots of the meta-analysis statistics for testing G × E effects under the three scenarios are shown in Figures 1–3, respectively. It can be observed that under all of the three scenarios, the empirical distributions of our four proposed statistics match well with their expected theoretical distributions.

Figure 1

Q–Q plots of the null distributions of the meta-statistics under scenario 1. The horizontal axis represents the negative log10 of the expected P-values, and the vertical axis represents the negative log10 of the observed P-values. (A) HOM-INT-FIX; (B) HOM-INT-RAN; (C) HET-INT-FIX; (D) HET-INT-RAN. Q–Q plots of the null distributions of the meta-statistics under scenario 2. The horizontal axis represents the negative log10 of the expected P-values, and the vertical axis represents the negative log10 of the observed P-values. (A) HOM-INT-FIX; (B) HOM-INT-RAN; (C) HET-INT-FIX; (D) HET-INT-RAN. Q–Q plots of the null distributions of the meta-statistics under scenario 3. The horizontal axis represents the negative log10 of the expected P-values, and the vertical axis represents the negative log10 of the observed P-values. (A) HOM-INT-FIX; (B) HOM-INT-RAN; (C) HET-INT-FIX; (D) HET-INT-RAN.

Statistical power of the meta-statistics

The results for the statistical power under three scenarios are shown in Figures 4–6, respectively, which include results from five levels of main and interaction effects and three proportions of causal variants. In each figure, the power values on the left are computed based on the simulated data sets in which the variant and gene–BMI interaction effects were the same across studies. The results on the right are based on the data sets with heterogeneous variant and gene–BMI interaction effects across studies.

Figure 4

Statistical power of meta-analyses with five levels of genetic main and interaction effects and three proportions of causal rare variants under scenario 1. The horizontal axis represents the level of genetic main and interaction effects, and the vertical axis represents the statistical power. (A) Power of HOM-INT-FIX, HOM-INT-RAN, HET-INT-FIX, and HET-INT-RAN with homogeneous interaction effects across studies and 20% causal rare variants. (B) Power of HOM-INT-FIX, HOM-INT-RAN, HET-INT-FIX, and HET-INT-RAN with heterogeneous interaction effects across studies and 20% causal rare variants. (C) Power of HOM-INT-FIX, HOM-INT-RAN, HET-INT-FIX, and HET-INT-RAN with homogeneous interaction effects across studies and 40% causal rare variants. (D) Power of HOM-INT-FIX, HOM-INT-RAN, HET-INT-FIX, and HET-INT-RAN with heterogeneous interaction effects across studies and 40% causal rare variants. (E) Power of HOM-INT-FIX, HOM-INT-RAN, HET-INT-FIX, and HET-INT-RAN with homogeneous interaction effects across studies and 60% causal rare variants. (F) Power of HOM-INT-FIX, HOM-INT-RAN, HET-INT-FIX, and HET-INT-RAN with heterogeneous interaction effects across studies and 60% causal rare variants. Statistical power of meta-analyses with five different levels of genetic main and interaction effects and three proportions of causal rare variants under scenario 2. The horizontal axis represents the level of genetic main and interaction effects, and the vertical axis represents the statistical power. (A) Power of HOM-INT-FIX, HOM-INT-RAN, HET-INT-FIX, and HET-INT-RAN with homogeneous interaction effects across studies and 20% causal rare variants. (B) Power of HOM-INT-FIX, HOM-INT-RAN, HET-INT-FIX, and HET-INT-RAN with heterogeneous interaction effects across studies and 20% causal rare variants. (C) Power of HOM-INT-FIX, HOM-INT-RAN, HET-INT-FIX, and HET-INT-RAN with homogeneous interaction effects across studies and 40% causal rare variants. (D) Power of HOM-INT-FIX, HOM-INT-RAN, HET-INT-FIX, and HET-INT-RAN with heterogeneous interaction effects across studies and 40% causal rare variants. (E) Power of HOM-INT-FIX, HOM-INT-RAN, HET-INT-FIX, and HET-INT-RAN with homogeneous interaction effects across studies and 60% causal rare variants. (F) Power of HOM-INT-FIX, HOM-INT-RAN, HET-INT-FIX, and HET-INT-RAN with heterogeneous interaction effects across studies and 60% causal rare variants. Statistical power of meta-analyses with five levels of genetic main and interaction effects and three proportions of causal rare variants under scenario 3. The horizontal axis represents the level of genetic main and interaction effects, and the vertical axis represents the statistical power. (A) Power of HOM-INT-FIX, HOM-INT-RAN, HET-INT-FIX, and HET-INT-RAN with homogeneous interaction effects across studies and 20% causal rare variants. (B) Power of HOM-INT-FIX, HOM-INT-RAN, HET-INT-FIX, and HET-INT-RAN with heterogeneous interaction effects across studies and 20% causal rare variants. (C) Power of HOM-INT-FIX, HOM-INT-RAN, HET-INT-FIX, and HET-INT-RAN with homogeneous interaction effects across studies and 40% causal rare variants. (D) Power of HOM-INT-FIX, HOM-INT-RAN, HET-INT-FIX, and HET-INT-RAN with heterogeneous interaction effects across studies and 40% causal rare variants. (E) Power of HOM-INT-FIX, HOM-INT-RAN, HET-INT-FIX, and HET-INT-RAN with homogeneous interaction effects across studies and 60% causal rare variants. (F) Power of HOM-INT-FIX, HOM-INT-RAN, HET-INT-FIX, and HET-INT-RAN with heterogeneous interaction effects across studies and 60% causal rare variants. As can be seen that the statistical powers of the four proposed meta-statistics are close to 0 for interaction effects of levels 1 and 2 where there are no or weak interaction effects. For the interaction effects of levels 3–5, the powers are approximately the same while keeping meta statistics, simulation scenarios and proportions of causal variants fixed. Power results under scenario 1 are almost the same as scenario 2, which suggests that meta-analyses of studies with different covariates distributions had little influence on the power. This is because that the INT-FIX and INT-RAN statistics at study level are essentially based on the phenotypic residuals after adjusting the covariates. Statistical powers under scenario 3 are in general greater than those under scenario 1 or 2, which is due to that AA samples are included and average number of causal variants is larger than that under scenario 1 or 2. It can be observed that HOM-INT-FIX and HOM-INT-RAN have similar power, and the power of HOM-INT-RAN is slightly greater than that of HOM-INT-FIX. Similar observations can be made for HET-INT-FIX and HET-INT-RAN. This is because INT-FIX and INT-RAN have almost the same power at the study level, which was demonstrated by Chen . When the simulated data sets are of homogeneous interaction effects, we can see from the left panels of Figures 4 and 5 that HOM-INT-FIX and HOM-INT-RAN have larger power than those of HET-INT-FIX and HET-INT-RAN. For instance, under scenario 1 with interaction effects of level 3, the power of HOM-INT-RAN is 0.401, 0.671, and 0.834 for 20, 40, and 60% of causal variants, respectively, which are 0.375, 0.637, and 0.811 for HET-INT-RAN. Therefore, HOM-INT-FIX and HOM-INT-RAN are preferable to HET-INT-FIX and HET-INT-RAN for meta-analyses of studies that are highly comparable, for which the underlying interaction effects are likely to be homogeneous.

Figure 5

Statistical power of meta-analyses with five different levels of genetic main and interaction effects and three proportions of causal rare variants under scenario 2. The horizontal axis represents the level of genetic main and interaction effects, and the vertical axis represents the statistical power. (A) Power of HOM-INT-FIX, HOM-INT-RAN, HET-INT-FIX, and HET-INT-RAN with homogeneous interaction effects across studies and 20% causal rare variants. (B) Power of HOM-INT-FIX, HOM-INT-RAN, HET-INT-FIX, and HET-INT-RAN with heterogeneous interaction effects across studies and 20% causal rare variants. (C) Power of HOM-INT-FIX, HOM-INT-RAN, HET-INT-FIX, and HET-INT-RAN with homogeneous interaction effects across studies and 40% causal rare variants. (D) Power of HOM-INT-FIX, HOM-INT-RAN, HET-INT-FIX, and HET-INT-RAN with heterogeneous interaction effects across studies and 40% causal rare variants. (E) Power of HOM-INT-FIX, HOM-INT-RAN, HET-INT-FIX, and HET-INT-RAN with homogeneous interaction effects across studies and 60% causal rare variants. (F) Power of HOM-INT-FIX, HOM-INT-RAN, HET-INT-FIX, and HET-INT-RAN with heterogeneous interaction effects across studies and 60% causal rare variants.

Under scenario 3, we can see from the left panels of Figure 6 that HET-INT-FIX and HET-INT-RAN have slightly greater or nearly the same power as HOM-INT-FIX and HOM-INT-RAN because of the heterogeneous effects between EUR and AA samples. When the simulated data sets are based on heterogeneous variant and gene–BMI interaction effects across studies, as can be depicted from the right panels of Figures 4–6 that HET-INT-FIX and HET-INT-RAN have larger power than those of HOM-INT-FIX and HOM-INT-RAN for the same proportion of causal variants under all scenarios. For instance, under scenario 1 with interaction effects of level 3, HOM-INT-RAN has power values of 0.304, 0.526, and 0.705 for three proportions of causal variants, respectively, while HET-INT-RAN has corresponding power values of 0.523, 0.797, and 0.925. Not surprisingly, HET-INT-FIX and HET-INT-RAN are superior to HOM-INT-FIX and HOM-INT-RAN in this case, since the interaction effects are heterogeneous across studies in the simulated data sets and the former two approaches appropriately account for such heterogeneity.

Figure 6

Statistical power of meta-analyses with five levels of genetic main and interaction effects and three proportions of causal rare variants under scenario 3. The horizontal axis represents the level of genetic main and interaction effects, and the vertical axis represents the statistical power. (A) Power of HOM-INT-FIX, HOM-INT-RAN, HET-INT-FIX, and HET-INT-RAN with homogeneous interaction effects across studies and 20% causal rare variants. (B) Power of HOM-INT-FIX, HOM-INT-RAN, HET-INT-FIX, and HET-INT-RAN with heterogeneous interaction effects across studies and 20% causal rare variants. (C) Power of HOM-INT-FIX, HOM-INT-RAN, HET-INT-FIX, and HET-INT-RAN with homogeneous interaction effects across studies and 40% causal rare variants. (D) Power of HOM-INT-FIX, HOM-INT-RAN, HET-INT-FIX, and HET-INT-RAN with heterogeneous interaction effects across studies and 40% causal rare variants. (E) Power of HOM-INT-FIX, HOM-INT-RAN, HET-INT-FIX, and HET-INT-RAN with homogeneous interaction effects across studies and 60% causal rare variants. (F) Power of HOM-INT-FIX, HOM-INT-RAN, HET-INT-FIX, and HET-INT-RAN with heterogeneous interaction effects across studies and 60% causal rare variants.

To evaluate the statistical efficiency of the proposed meta-analysis methods, we compared the statistical power of HOM-INT-FIX and HOM-INT-RAN with that of pooled analyses conducted based on interaction tests with individual-level data from all studies under scenario 1. The results with 40% causal variants, interaction effects of level 3 and homogeneous interaction effects across studies are shown in Figure 7. The P-values of the HOM-INT-FIX meta-analysis and the pooled analysis with the INT-FIX test are compared in Figure 7A. As we can see, almost all points lie on a diagonal line, indicating that the HOM-INT-FIX meta-analysis is equally as powerful as the single pooled interaction test based on INT-FIX. Similarly, Figure 7B shows that the HOM-INT-RAN meta-analysis is equally as powerful as the pooled INT-RAN analysis. Because the pooled INT-FIX and INT-RAN analyses implicitly assume the same interaction effect across studies, these results demonstrate that there is no power loss with the HOM-INT-FIX and HOM-INT-RAN meta-analyses when this assumption is satisfied.

Figure 7

P-values of HOM-INT-FIX, HOM-INT-RAN, Pooled-INT-FIX and Pooled-INT-RAN with homogeneous interaction effects across studies, interaction effects of level 3 and 40% causal rare variants under scenario 1. (A) P-values of HOM-INT-FIX and Pooled-INT-FIX. The horizontal axis represents the negative log10P-values of Pooled-INT-FIX, and the vertical axis represents the negative log10P-values of HOM-INT-FIX. (B) P-values of HOM-INT-RAN and Pooled-INT-RAN. The horizontal axis represents the negative log10P-values of Pooled-INT-RAN, and the vertical axis represents the negative log10P-values of HOM-INT-RAN. We also compared the statistical power of the HET-INT-FIX and HET-INT-RAN meta-analyses with that of the pooled INT-FIX and INT-RAN analyses, respectively. For scenario 1 with 40% causal variants, interaction effects of level 3 and heterogeneous interaction effects across studies, the results are shown in Figure 8. As we can see, the P-values of the HET-INT-FIX and HET-INT-RAN meta-analyses are smaller than those of the pooled INT-FIX and INT-RAN analyses. Therefore, HET-INT-FIX and HET-INT-RAN are more powerful than the pooled analyses when heterogeneity of the interaction effect exists. This is because the HET-INT-FIX and HET-INT-RAN meta-analyses treat genetic heterogeneity appropriately when synthesizing the summary results from multiple studies. By contrast, the pooled individual-level data contain mixed interaction effects for the same variants, thus violating the underlying assumptions of INT-FIX and INT-RAN.

Figure 8

P-values of HET-INT-FIX, HET-INT-RAN, Pooled-INT-FIX, and Pooled-INT-RAN with heterogeneous interaction effects across studies, interaction effects of level 3 and 40% causal rare variants under scenario 1. (A) P-values of HET-INT-FIX and Pooled-INT-FIX. The horizontal axis represents the negative log10P-values of Pooled-INT-FIX, and the vertical axis represents the negative log10P-values of HET-INT-FIX. (B) P-values of HET-INT-RAN and Pooled-INT-RAN. The horizontal axis represents the negative log10P-values of Pooled-INT-RAN, and the vertical axis represents the negative log10P-values of HET-INT-RAN.

Gene–age interactions in blood pressure traits with UK Biobank data

UK Biobank is a prospective cohort study involving approximately 500,000 volunteers in the United Kingdom aged between 40 and 69 years, from whom extensive genetic and phenotypic data have been collected (Sudlow ; Bycroft ). The latest release of whole-exome sequencing data contains data from 200,643 participants. We extracted participants with European, African or Asian ancestry and excluded participants who had withdrawn and one member of each pair of first- or second-degree relatives. The average SBP and diastolic blood pressure (DBP) measurements at the first visit were used. For participants taking antihypertensive medications at the time of the visit, 10 and 5 mm Hg were added to their SBP and DBP measurements, respectively (Cui ). Mean arterial pressure (MAP) and pulse pressure (PP) were calculated as and , respectively, and the latter was then logarithmically transformed. Participants with missing BP phenotypes or covariates, including age, BMI and sex, were excluded. Phenotype and covariate outliers were also excluded, where these were defined as data points lying at least 5 standard deviations away from the corresponding means. In summary, a total of 162,148 European samples, 3113 African samples and 4745 Asian samples with complete phenotype, covariates, and exome genotype data were included in our analyses. To conduct meta-analyses of multiple studies of the same ancestry, we divided the European samples into five groups based on the geographic locations of their assessment centers: group 1 included participants from assessment centers in Edinburgh, Glasgow, Middlesbrough, and Newcastle; group 2 was from Barts, Croydon, Hounslow, Oxford, and Reading; group 3 was from Bristol, Cardiff, and Swansea; group 4 was from Leeds, Sheffield, Nottingham, and Birmingham; and group 5 was from Bury, Cheadle, Liverpool, Manchester, Stockport, Stoke, and Wrexham. For trans-ethnic meta-analyses, group 6 of African samples and group 7 of Asian samples were included. The characteristics of samples in the seven groups are presented in Table 2.

Table 2

Characteristics of UK Biobank samples in seven groups

Group	N	Male (%)	HTx (%)	Age (years)		BMI (kg/m²)		SBP (mmHg)		DBP (mmHg)
Group	N	Male (%)	HTx (%)	Mean	SD	Mean	SD	Mean	SD	Mean	SD
1	27,154	44.0	21.4	56.8	8.0	27.5	4.7	139.7	18.8	83.2	10.1
2	31,282	44.5	18.4	57.0	8.0	26.7	4.6	135.3	18.0	81.1	10.0
3	18,932	43.5	18.9	56.1	8.2	27.3	4.7	139.3	18.5	83.1	10.0
4	52,721	46.0	20.7	57.1	7.9	27.5	4.7	138.3	18.4	82.2	10.0
5	32,059	47.0	21.5	56.9	7.9	27.6	4.7	138.4	18.4	82.0	10.0
6	3,113	41.5	31.9	52.0	8.0	29.5	5.3	138.0	18.6	84.9	10.7
7	4,745	50.0	22.9	53.0	8.4	26.7	4.3	134.3	18.6	82.5	10.3

N, number of participants; HTx, percentage of individuals on antihypertensive medications at the time of clinic visit; BMI, body mass index; DBP, diastolic blood pressure; SBP, systolic blood pressure; SD, standard deviation.

Characteristics of UK Biobank samples in seven groups N, number of participants; HTx, percentage of individuals on antihypertensive medications at the time of clinic visit; BMI, body mass index; DBP, diastolic blood pressure; SBP, systolic blood pressure; SD, standard deviation. We selected nine loci that showed nominal evidence (P < 0.05) of SNP–age interaction in a genome-wide search of common variants with age-dependent effects in BP regulation (Simino ). For the reported leading SNPs that are in gene regions, the corresponding genes were selected for analyzing gene–age interaction with rare variants; otherwise, the nearest up- and downstream genes were chosen. In total, 12 candidate genes were selected from the nine loci. The variants of the 12 genes were annotated with VEP (McLaren ), and those annotated as stop_loss, missense_variant, start_lose, splice_donor_variant, inframe_deletion, frameshift_variant, splice_acceptor_variant, stop_gained, or inframe_insertion were used for analysis. In addition, variants that were PolyPhen or SIFT benign, defined as variants with PolyPhen scores smaller than 0.15 or SIFT scores larger than 0.05 (Cirulli ), were excluded. PLINK (Chang ) was used to extract rare variants with MAFs smaller than 0.03, and fcGENE (Roshyara and Scholz 2014) was used to convert genotypes into numeric values. The numbers of variants in the genes used for the single and multiple ancestries meta-analyses are shown in Tables 3 and 4, respectively.

Table 3

P-values of meta-analyses and pooled analyses of gene–age interaction in BP traits for the five geographic groups of European ancestry in the UK Biobank data

Loci	SNP	Up/down stream gene	RV Num	Trait	P-value
Loci	SNP	Up/down stream gene	RV Num	Trait	HOM-INT-FIX	HOM-INT-RAN	HET-INT-FIX	HET-INT-RAN	Pooled-INT-FIX	Pooled-INT-RAN
1	rs880315	CASZ1	145	SBP	0.95	0.96	0.63	0.67	0.94	0.93
2	rs6797587	CDC25A	25	MAP	0.03	0.06	0.08	0.10	0.04	0.05
3	rs11099098	PRDM8	59	SBP	0.82	0.92	0.93	0.96	0.88	0.92
3	rs11099098	FGF5	7	SBP	0.41	0.52	0.24	0.30	0.50	0.46
4	rs198846	HIST1H1T	71	DBP	0.18	0.19	0.27	0.25	0.17	0.19
5	rs12705390	CCDC71L	7	PP	0.70	0.95	0.85	0.96	0.60	0.94
5	rs12705390	PIK3CG	65	PP	0.82	0.79	0.40	0.48	0.76	0.73
6	rs7070797	C10orf107	19	MAP	0.07	0.06	0.04	0.13	0.06	0.10
6	rs7070797	ARID5B	60	MAP	0.94	0.95	0.74	0.94	0.89	0.91
7	rs4601790	EHBP1L1	74	MAP	0.05	0.06	0.11	0.10	0.07	0.06
8	rs11072518	COX5A	16	MAP	0.54	0.55	0.65	0.66	0.56	0.57
9	rs17608766	GOSR2	12	PP	0.22	0.20	0.69	0.71	0.20	0.15

RV Num, number of rare variants; SBP, systolic blood pressure; DBP, diastolic blood pressure; MAP, mean arterial pressure; PP, pulse pressure.

Table 4

P-values of meta-analyses and pooled analyses of gene–age interaction in BP traits for the seven groups of African, Asian, and European ancestries in the UK Biobank data

Loci	SNP	Up/down stream gene	RV Num	Trait	P-value
Loci	SNP	Up/down stream gene	RV Num	Trait	HOM-INT-FIX	HOM-INT-RAN	HET-INT-FIX	HET-INT-RAN	Pooled-INT-FIX	Pooled-INT-RAN
1	rs880315	CASZ1	164	SBP	0.96	0.97	0.95	0.96	0.93	0.93
2	rs6797587	CDC25A	28	MAP	0.03	0.06	0.03	0.06	0.05	0.05
3	rs11099098	PRDM8	66	SBP	0.83	0.88	0.83	0.94	0.84	0.88
3	rs11099098	FGF5	7	SBP	0.41	0.52	0.41	0.52	0.52	0.47
4	rs198846	HIST1H1T	85	DBP	0.24	0.26	0.19	0.21	0.19	0.13
5	rs12705390	CCDC71L	7	PP	0.95	0.99	0.95	0.98	0.87	0.99
5	rs12705390	PIK3CG	73	PP	0.82	0.79	0.86	0.83	0.82	0.75
6	rs7070797	C10orf107	24	MAP	0.15	0.13	0.10	0.09	0.24	0.35
6	rs7070797	ARID5B	64	MAP	0.87	0.94	0.88	0.94	0.75	0.95
7	rs4601790	EHBP1L1	91	MAP	0.05	0.05	0.06	0.06	0.02	0.03
8	rs11072518	COX5A	16	MAP	0.54	0.58	0.54	0.56	0.64	0.54
9	rs17608766	GOSR2	13	PP	0.43	0.44	0.39	0.37	0.45	0.45

RV Num, number of rare variants; SBP, systolic blood pressure; DBP, diastolic blood pressure; MAP, mean arterial pressure; PP, pulse pressure.

P-values of meta-analyses and pooled analyses of gene–age interaction in BP traits for the five geographic groups of European ancestry in the UK Biobank data RV Num, number of rare variants; SBP, systolic blood pressure; DBP, diastolic blood pressure; MAP, mean arterial pressure; PP, pulse pressure. P-values of meta-analyses and pooled analyses of gene–age interaction in BP traits for the seven groups of African, Asian, and European ancestries in the UK Biobank data RV Num, number of rare variants; SBP, systolic blood pressure; DBP, diastolic blood pressure; MAP, mean arterial pressure; PP, pulse pressure. For each of the 12 genes, we first conducted INT-FIX and INT-RAN analyses on each of the seven groups with the primary BP traits reported in Simino and obtained summary results. Age, sex, BMI, and the first 10 principal components were used as covariates, and age was used as the “environmental” variable. The summary results from the five groups of European ancestry were then combined by means of HOM-INT-FIX, HOM-INT-RAN, HET-INT-FIX, and HET-INT-RAN. For comparison, we performed INT-FIX and INT-RAN analyses on all samples of the five groups, denoted by Pooled-INT-FIX and Pooled-INT-RAN. The P-values of the four meta-analyses and the two pooled analyses are displayed in Table 3. For the trans-ethnic meta-analyses based on summary results from the seven groups, we also conducted the four meta-analyses: HOM-INT-FIX, HOM-INT-RAN, HET-INT-FIX, and HET-INT-RAN, and the Pooled-INT-FIX and Pooled-INT-RAN, whose P-values are displayed in Table 4. Similar to the simulation results, HOM-INT-FIX and HOM-INT-RAN show approximately the same P-values, and the P-values of HET-INT-FIX and HET-INT-RAN are close in Tables 3 and 4. HOM-INT-FIX yields P-values approximately equal to those of Pooled-INT-FIX, and HOM-INT-RAN has roughly the same P-values as Pooled-INT-RAN in Table 3. Moreover, HOM-INT-FIX and HOM-INT-RAN have smaller P-values than HET-INT-FIX and HET-INT-RAN for most of the 12 genes. This suggests that the interaction effects, if they are genuine, are homogeneous across the five groups of European ancestry in UK Biobank. Two genes show nominal evidence of interaction with age in the meta-analyses. CDC25A and C10orf107 have P-values of 0.03 and 0.04 in HOM-INT-FIX and HET-INT-FIX, respectively. After correction for multiple testing, these results are no longer significant. In Table 4, the P-values of HOM-INT-FIX are larger than those of HET-INT-FIX, and the P-values of HOM-INT-RAN are larger than those of HET-INT-RAN, which suggests that the interaction effects, if they exist, are heterogeneous across ancestry groups in the UK Biobank data. Besides, the P-values of Pooled-INT-FIX are larger than those of HET-INT-FIX, which also indicates that the interaction effects are heterogeneous. CDC25A has P-value of 0.03 in both HOM-INT-FIX and HET-INT-FIX, which are no longer significant after the correction for multiple testing.

Discussion

In this study, we propose four meta-analysis methods for testing G × E effects with rare variants while treating main genetic effects as either fixed or random and considering homogeneous or heterogeneous G × E effects across studies. Simulations as well as an analysis of UK Biobank data demonstrate that treating variant main effects as either fixed or random provides approximately the same statistical power. The results are consistent with the conclusion in Chen that INT-FIX has almost the same power as INT-RAN when analyzing interaction in a single study. It has been suggested that when the number of genetic variants is large, INT-FIX will lead to unstable estimates of the main genetic effects and, thus, INT-RAN is recommended (Chen ). Therefore, our HOM-INT-FIX and HET-INT-FIX meta-analyses are valid only if INT-FIX is applicable for the study-level analysis. Otherwise, INT-RAN should be used in the studies, and HOM-INT-RAN or HET-INT-RAN should be used for the meta-analysis. We developed meta-analyses for testing G × E effects by combining summary results from INT-FIX and INT-RAN (Chen ) at study level based on the analysis framework proposed by Lee . The methods offer much larger sample size for testing the interaction which is impossible for a single study. They were shown to be statistical efficient in our simulation studies as well as the analyses of UK Biobank data. It is well established that for the meta-analysis of association statistics with common variants, the power loss is minimal (Lin and Zeng 2010). However, the power loss of meta-analysis for rare variants is largely unexamined and has been suspected to be possibly more sizable (Panagiotou ). In this study, we have compared the HOM-INT-FIX and HOM-INT-RAN meta-analyses with pooled analyses based on INT-FIX and INT-RAN, respectively. Our results show that they have approximately the same power. In gene–age interaction analyses of UK Biobank data, the results from the meta-analyses and the pooled analyses are close. Therefore, the power loss of meta-analysis for rare variants is concluded to be minimal as well. In the gene–age interaction analysis of UK Biobank data, none of the 12 genes from the nine candidate loci shows experiment-wide significant results. There are many possible reasons. First, most GWAS SNPs are from noncoding regions, and some of them may play regulatory roles by changing the expression levels of the modulated genes (Edwards ). Nevertheless, the rare variants selected in our analyses likely alter the protein sequences that the genes express, which are not necessarily correlated with the regulatory SNPs. Second, it has been estimated that only approximately one-third of causal genes are the nearest genes to the GWAS loci (Gusev ; Zhu ). Thus, we may have missed the majority of the causal genes, which are located farther away from the candidate loci. Finally, the nine loci identified in the genome-wide analysis show only nominal evidence of interactions, which are not genome-wide significant. Therefore, some of the loci considered in the analyses may represent spurious interaction effects. In this study, we have demonstrated the proposed meta-analyses for continuous traits. The methods can be readily generalized to the case of binary traits. For binary traits, logistic versions of INT-FIX and INT-RAN can be applied at the study level, and the G × E score statistics for single variants and the relationship matrices can be computed based on (5) and (7)–(9), the same formulas as for continuous traits. There is no difference in how the meta-statistics are computed for binary and continuous traits. However, when studies have unbalanced case-control ratios and minor allele counts in a gene are very low, using saddlepoint approximation can result in inaccurate P-values for binary traits, and efficient resampling can be used instead (Lee ). Our proposed meta-analyses are limited to testing G × E effects only. Joint testing of main genetic effects and interaction effects has long been suggested for the interaction analysis of common variants (Kraft ; Manning ). Joint testing offers better power than the analysis of main genetic effects only and the analysis of interaction effects only when both types of effects exist. In future work, we will further extend our meta-analyses to allow the joint testing of main genetic effects and G × E effects.

Conclusions

In this study, we proposed four powerful meta-analysis methods for testing G × E effects with rare variants. We considered both homogeneous G × E effects and heterogeneous G × E effects across studies and efficiently combined the summary statistics of INT-FIX and INT-RAN from multiple studies. Through simulations and real data analysis, we demonstrated that our approaches provide power comparable to that of pooled analysis when the interaction effects are homogeneous across studies. When heterogeneity exists across studies, our approaches can treat heterogeneity appropriately and achieve greater statistical power than a pooled analysis. Our meta-analysis methods of testing G × E effects can be applied to synthesize results from multiple diverse studies to increase the effective sample size and improve the chance of identifying genes whose effects are modified by an environmental factor.

Funding

This work was supported by the national Thousand Youth Talents Plan.

50 in total

1. Calibrating a coalescent simulation of human genome sequence variation.

Authors: Stephen F Schaffner; Catherine Foo; Stacey Gabriel; David Reich; Mark J Daly; David Altshuler
Journal: Genome Res Date: 2005-11 Impact factor: 9.043

Review 2. Next-generation DNA sequencing techniques.

Authors: Wilhelm J Ansorge
Journal: N Biotechnol Date: 2009-02-03 Impact factor: 5.079

Review 3. Genome-wide association studies for complex traits: consensus, uncertainty and challenges.

Authors: Mark I McCarthy; Gonçalo R Abecasis; Lon R Cardon; David B Goldstein; Julian Little; John P A Ioannidis; Joel N Hirschhorn
Journal: Nat Rev Genet Date: 2008-05 Impact factor: 53.242

4. ACAT: A Fast and Powerful p Value Combination Method for Rare-Variant Analysis in Sequencing Studies.

Authors: Yaowu Liu; Sixing Chen; Zilin Li; Alanna C Morrison; Eric Boerwinkle; Xihong Lin
Journal: Am J Hum Genet Date: 2019-03-07 Impact factor: 11.025

5. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets.

Authors: Zhihong Zhu; Futao Zhang; Han Hu; Andrew Bakshi; Matthew R Robinson; Joseph E Powell; Grant W Montgomery; Michael E Goddard; Naomi R Wray; Peter M Visscher; Jian Yang
Journal: Nat Genet Date: 2016-03-28 Impact factor: 38.330

6. The design of case-control studies: the influence of confounding and interaction effects.

Authors: P G Smith; N E Day
Journal: Int J Epidemiol Date: 1984-09 Impact factor: 7.196

7. Test for interactions between a genetic marker set and environment in generalized linear models.

Authors: Xinyi Lin; Seunggeun Lee; David C Christiani; Xihong Lin
Journal: Biostatistics Date: 2013-03-05 Impact factor: 5.899

8. A groupwise association test for rare mutations using a weighted sum statistic.

Authors: Bo Eskerod Madsen; Sharon R Browning
Journal: PLoS Genet Date: 2009-02-13 Impact factor: 5.917

9. Low-frequency and rare variants may contribute to elucidate the genetics of major depressive disorder.

Authors: Chenglong Yu; Mauricio Arcos-Burgos; Bernhard T Baune; Volker Arolt; Udo Dannlowski; Ma-Li Wong; Julio Licinio
Journal: Transl Psychiatry Date: 2018-03-27 Impact factor: 6.222

10. The UK Biobank resource with deep phenotyping and genomic data.

Authors: Clare Bycroft; Colin Freeman; Desislava Petkova; Gavin Band; Lloyd T Elliott; Kevin Sharp; Allan Motyer; Damjan Vukcevic; Olivier Delaneau; Jared O'Connell; Adrian Cortes; Samantha Welsh; Alan Young; Mark Effingham; Gil McVean; Stephen Leslie; Naomi Allen; Peter Donnelly; Jonathan Marchini
Journal: Nature Date: 2018-10-10 Impact factor: 49.962