Literature DB >> 29198721

A Selection Operator for Summary Association Statistics Reveals Allelic Heterogeneity of Complex Traits.

Zheng Ning¹, Youngjo Lee², Peter K Joshi³, James F Wilson⁴, Yudi Pawitan¹, Xia Shen⁵.

Abstract

In recent years, as a secondary analysis in genome-wide association studies (GWASs), conditional and joint multiple-SNP analysis (GCTA-COJO) has been successful in allowing the discovery of additional association signals within detected loci. This suggests that many loci mapped in GWASs harbor more than a single causal variant. In order to interpret the underlying mechanism regulating a complex trait of interest in each discovered locus, researchers must assess the magnitude of allelic heterogeneity within the locus. We developed a penalized selection operator for jointly analyzing multiple variants (SOJO) within each mapped locus on the basis of LASSO (least absolute shrinkage and selection operator) regression derived from summary association statistics. We found that, compared to stepwise conditional multiple-SNP analysis, SOJO provided better sensitivity and specificity in predicting the number of alleles associated with complex traits in each locus. SOJO suggested causal variants potentially missed by GCTA-COJO. Compared to using top variants from genome-wide significant loci in GWAS, using SOJO increased the proportion of variance prediction for height by 65% without additional discovery samples or additional loci in the genome. Our empirical results indicate that human height is not only a highly polygenic trait, but also has high allelic heterogeneity within its established hundreds of loci.

Entities: Chemical Disease Gene Species

Keywords: LASSO; allelic heterogeneity; complex traits; fine-mapping; genome-wide association study; human height; missing heritability; prediction; shrinkage estimator; summary association statistics

Mesh：

Year: 2017 PMID： 29198721 PMCID： PMC5812891 DOI： 10.1016/j.ajhg.2017.09.027

Source DB: PubMed Journal: Am J Hum Genet ISSN： 0002-9297 Impact factor: 11.025

Introduction

Genome-wide association studies (GWASs) have successfully identified many genetic variants that regulate complex traits. However, the associations between a complex trait and genetic variants, such as single-nucleotide polymorphisms (SNPs), are usually very small relative to noise. Thus, GWASs often require large sample sizes to achieve sufficient power, and substantial efforts have been spent on the development of statistical methods to boost GWAS discovery power. Given the legal restrictions on public sharing of individual-level data, it is rarely feasible to pool individual-level data from a number of different cohorts. In spite of this, GWAS summary-level data, in the form of association statistics, are mostly meta-analyzed and reported. Hence, the recent focus in methodology has been on meta-analysis techniques that use summary-level data based on established results to extract further knowledge. Based on association statistics, a few state-of-the-art methods, such as summary-level Mendelian randomization (SMR) analysis for candidate-gene-target prediction; LD score regression (LDSC) for polygenicity detection, heritability, and genetic correlation estimation; and conditional and joint multiple-SNP analysis (GCTA-COJO) for detection of independent associations within quantitative trait loci (QTL) discovered in GWAS, have been developed. In GWASs, if the single most statistically significant variant at a locus is reported, we only capture all the genetic variance—i.e., there is no missing heritability at the locus—when two assumptions hold: (1) there is only one underlying causal variant at the locus, and (2) the causal variant is in complete linkage disequilibrium (LD) with the top variant. However, these two assumptions can both be questioned: (1) there might be multiple causal variants or alleles at the locus so that a single variant cannot account for all the genetic variance at the locus. The phenomenon wherein multiple causal variants or alleles for a particular trait are located at the same locus is known as allelic heterogeneity (AH), whose presence in various complex diseases is reported in a recent study. (2) Even if there is only one underlying causal variant at the locus, a single top variant cannot capture all the genetic variance if the LD between the top variant and the causal variant is incomplete. To identify secondary association signals, many GWAS meta-analyses have used conditional analysis such as GCTA-COJO. GCTA-COJO performs a secondary association analysis conditioned on discovered top variants; such conditional analysis is conducted with GWAS meta-analysis summary statistics rather than individual-level data of the full sample. In recent analyses conducted by global consortia such as GIANT and DIAGRAM, GCTA-COJO was successful in detecting multiple associations in LD at the same loci.6, 7, 8, 9 However, the forward stepwise selection procedure, such as that implemented in GCTA-COJO, is known to be overly “greedy”; it is prone to eliminating useful predictors that happen to be correlated with selected predictors. This indicates that GCTA-COJO might miss some informative variants as a result of their LD with detected variants. More variants can be discovered when the discovery p value threshold is less stringent in GCTA-COJO. But as a fixed-effect model-selection strategy, there is a risk of overfitting for GCTA-COJO, especially when too many predictors are included in the model as p value threshold is increased. There is theoretical and empirical evidence that simultaneous modeling of multiple predictors with penalization provides a better variable selection procedure than the forward stepwise selection. In this framework, the least absolute shrinkage and selection operator (LASSO) was introduced and applied to variable selection problems in various disciplines.13, 14 Instead of only considering the square loss function , LASSO takes the -norm regularization into account and solveswhere the tuning parameter . Intuitively, the term is a penalization: the larger is, the larger the penalty imposed on the coefficients. This makes LASSO allow large coefficients only when they lead to a substantially better fit. LASSO leads to better interpretability and prediction accuracy. Because of regularization, LASSO has the ability to perform variable selection and get parsimonious results. Besides, as a shrinkage method, LASSO alleviates overfitting problems by performing a more reasonable bias-variance trade-off, which allows LASSO to include more informative predictors in the model without serious overfitting. The LARS algorithm and regularization path algorithm provide computationally fast ways for solving the LASSO. These benefits make LASSO potentially highly useful in genetics research. In many recent papers, LASSO was used for selecting variants and building prediction models. The aim of this study is to develop, implement, and validate LASSO by using GWAS summary statistics (SOJO) for genomic loci discovered in standard GWASs. First, we show that using summary-level data for LASSO achieves results that are approximately equivalent to those obtained when LASSO is based on individual-level data. We then provide simulation studies to show how SOJO can outperform GCTA-COJO in finding additional association signals in loci with different LD structures. We applied SOJO on GWAS summary-level data of three anthropometric traits—height, body mass index (BMI), and waist-to-hip ratio after adjustment for body mass index (WHRadjBMI) reported by the GIANT consortium—and validated the out-of-sample predictive performance in the large national cohort UK Biobank (UKB). By implementing SOJO, we have added additional association information to the results of standard GWASs and GCTA-COJO analyses, improved out-of-sample predictive heritability, and revealed different levels of allelic heterogeneity for different traits. The SOJO analysis is implemented in our free and open-source R package.

Material and Methods

LASSO Regularization Path Based on GWAS Summary Statistics

In this section, we describe how to achieve LASSO estimates by using summary-level statistics from a GWAS meta-analysis and a reference sample. Assume a quantitative trait y is potentially affected by a group of genetic variants and a multi-variant linear modelwhere . If we have n individuals, then is the phenotype vector, and is the genotype matrix. To get an estimate of regression coefficients , we look at the square loss function and the -norm regularization , which leads to the LASSO optimization problemwhere the tuning parameter . The regularization path can be used to compute LASSO estimates in Equation 2 as a function of λ, denoted by , for all . Interestingly, when the sample size is large, the regularization-path algorithm only depends on (1) the covariance structure between variants and the trait, and (2) the LD structure between variants. Therefore, we can approximate LASSO estimates by using summary-level statistics from a GWAS meta-analysis and a reference sample. The first step is to get the covariance structure between variants and the trait. To simplify the formulae, we center the data so that and , where , and the intercept does not need to be included. Because the centering does not affect the estimates of slope in summary-level statistics, we can take the GWAS results in meta-analysis as they are from centered data. Then in the GWAS, each variant is fitted according to a univariate regression model: Based on Equation 3, the marginal effect of variant j isand its variance iswhere is the residual variance in univariate regression (Equation 3) and is the phenotypic variance. Because the effect of a single variant is usually small, we can approximate by . From Equation 4 and Equation 5, we havewhere all terms on the right side except are reported in the GWAS meta-analysis results. For , because all s and λ in the algorithm are proportional to , it is fine to assume if only the variable selection or explained by polygenic scores is concerned. If exact estimates of coefficients are needed, can be estimated by the phenotypic variance in the reference sample mentioned below. The LD structure between variants can be approximated by a representative reference sample where individual-level genotype data are available. A proper reference sample can be a cohort included in the meta-analysis study. Let represent the genotype matrix of the reference sample. Then To simplify symbols, we define and . Considering different allele frequency between variants, we suggest using and with standardized . Let denote the diagonal matrix of . Standardized leads to Let k be the step counter, be the tuning parameter at the current step, denote the sign of , and be the active set. Starting with , and . The LASSO regularization path algorithm can be implemented as follows; 1. Get the next hitting timewhere means the maximum argument that is smaller than . Denote the index of the hitting variable as and its sign as . Specifically, 2. Get the next crossing timewhere means the maximum argument that is smaller than . Denote the index of the crossing variable as and its sign as . Specifically, . 3. LetIf , then add the index of the hitting variable to and its sign to . If , then remove the index of the crossing variable from and its sign from . 4. Get the LASSO estimate at from 5. Then update k to and repeat steps 1–4 until . If the standardized is used, and if the coefficients under standardization with tuning parameter λ are denoted as , then the coefficients on the original scalefor any . In GWAS meta-analysis results, the sample sizes for different variants are usually different because of imputation failures in the studies involved. However, is estimated for each variant separately in Equation 6. Therefore, the above algorithm is still valid.

Summary Statistics of Anthropometric Traits and Individual-Level Genotype Data

The GIANT Consortium performed a GWAS meta-analysis by using the summary statistics from 79, 125, and 101 studies, consisting of 253,288, 322,154, and 210,088 individuals of European ancestry for adult height, BMI, and WHRadjBMI, respectively. Meta-analysis was performed on ∼2.6 million SNPs for all the three traits. After SNPs with MAF 0.01 were excluded, ∼2.5 million SNPs remained. Considering the accuracy of the estimated correlation between SNPs and traits, we excluded SNPs with sample size less than 2/3 of the maximum sample size but retained ∼2.4 million, ∼2.2 million, and ∼1.7 million SNPs for height, BMI, and WHRadjBMI, respectively. We also used the individual-level genotype data of the TwinGene cohort, which is a population-based Swedish study of twins born between 1911 and 1958. Genotyping was done with the Illumina OmniExpress BeadChip. After the quality control, 644,556 SNPs and 9,617 individuals remained, including all available dizygotic twins and one twin from each available monozygotic twin pair. Another source of individual-level genotype data is the 503 European ancestry samples in 1000 Genomes Project phase 3 data.

UK Biobank Data

The UK Biobank recruited 500,000 people aged 40–69 years between 2006 and 2010 from across the country. Here, a wave 1 public release in June 2015 is used. Among individuals whose phenotypic information was available, 152,732 had been genotyped on an Affymetrix chip that included about 800,000 variants. Millions of further variants were imputed. Among the genotyped individuals, 120,286 were identified as genetically British by the UK Biobank. These individuals were taken forward for analysis in this paper. In the prediction performance analyses, height, BMI, and WHRadjBMI in UKB were adjusted for age and sex before being standardized to z-scores.

Application of SOJO at Established Genome-wide Significant Loci

For each trait, first we took all loci with genome-wide significant SNPs reported in GIANT results. There were 423, 77, and 49 loci for height, BMI, and WHRadjBMI, respectively. For each of these loci, we set a 1 Mb window centered at the most significant variant as the genomic locus to be analyzed. We performed SOJO to select the associated variants for each locus by using the following steps: 1. We took the intersection of available variants in GIANT and TwinGene. 2. We estimated LD correlations by using individual-level genotype data in TwinGene. 3. We filtered the variants according to the LD correlation matrix. If the LD between a pair of variants was larger than 0.9, only the more significant one in GWAS meta-analysis was kept for further analysis. 4. We ran the summary-level LASSO algorithm by using summary statistics from GIANT and the filtered estimated LD correlations in step 3. 5. Along the LASSO path, the SNPs were included or removed from the model one by one as λ decreased. For each point, when the active-variant set changed, we computed the out-of-sample on the basis of the current active-variant set and coefficients. 6. We reported the variants that maximized the out-of-sample and their penalized effects. In step 3, we removed one SNP from each pair of extremely correlated variants because (1) including both of them didn’t significantly increase the amount of information gained, and (2) including both might have generated numerical errors when the tuning parameter went to zero and the model approached the standard multiple regression.

Adjust Model Degrees of Freedom for Comparison

For the comparison between SOJO and GCTA-COJO to be fair, the two must be under the same level of model complexity. Degrees of freedom is often used as a measurement of model complexity. When comparing two linear models, both with p predictors, one can say their model complexity is the same because their degrees of freedom are both equal to p. However, when it comes to evaluating two complex variable selection procedures, especially when comparisons or shrinkage is involved, the degrees of freedom or the complexity of the model might no longer be equal to the number of variables selected by the model. Suppose we have observations where For a function producing fitted values based on , the value of the generalized degrees of freedom (GDF) is defined as: Estimating GDF for GCTA-COJO and SOJO with a Monte Carlo method requires individual-level data. Without GIANT individual-level data, we could not directly estimate the GDF when GIANT was the discovery sample. Instead, we saw GDF as a piecewise function of the number of selected variables, and we estimated the function by using the UKB data. First, we estimated the GDF for GCTA-COJO and SOJO locus by locus for each trait by using UKB data. We performed the estimation by using multiple p value thresholds for GCTA-COJO and different tuning parameters for LASSO. In this way, we obtained an estimate of the function mapping the number of selected variables to GDF. Then, for each variable selection result based on GIANT and TwinGene, we could estimate the GDF by using the function. According to our result, if we include k variables in our model, SOJO costs exactly k GDF, which is consistent with theoretical results, whereas GCTA-COJO usually costs more than k GDF. An example is given in Figure S1.

Results

LASSO from Summary-Level Data Approximates That from Individual-Level Data

We can approximate the LASSO result at any tuning parameter by using (1) the covariance structure between variants and the trait and (2) the LD structure between variants. The former covariance structure can be estimated from GWAS meta-analysis summary-level data, and the LD structure can be estimated from a reference sample, such as a subcohort of the GWAS meta-analysis. Figure 1 shows the similarity of LASSO results under six different scenarios. In each plot, each line shows how the coefficient estimates vary under different tuning parameters. Theoretically, when the effects of single variants are all small and the whole cohort is taken as reference sample, the summary-level LASSO estimates are the same as those based on individual-level LASSO results (Figures 1A and 1B). A real scenario can be more complicated in two ways: (1) individual-level data are available only for a subset of the cohort, which affects the estimation of LD correlation between variants, and (2) the sample sizes are usually different for different variants because of, e.g., imputation failures in the studies involved. However, as shown in Figures 1C and 1D, when a relatively large subsample is used as the reference sample and the number of missing individuals for each variant is not substantial, the summary-level LASSO results are close to individual-level LASSO results. When the representative reference samples are outside of the discovery population, the summary-level LASSO results are still similar (Figures 1E and 1F). Our simulation shows that the out-of-sample prediction performance is also similar for these scenarios (Figure S2).

Figure 1

An Example Showing the Approximation of Summary-Level LASSO to Individual-Level LASSO

The phenotype and genotype data are from 120,086 individuals in the UK Biobank. GWAS was performed on height. The curves represent regularization paths of Lasso coefficients. The six plots show LASSO results in six different scenarios of data: (A) LASSO based on individual level data. (B) LASSO based on GWAS summary statistics and LD correlations estimated from the whole cohort. (C) LASSO based on GWAS summary statistics and LD correlations estimated from a subcohort where n = 10,000. (D) LASSO based on unequal sample sizes, GWAS summary statistics, and LD correlations estimated from a subcohort where n = 10,000. The subcohorts in (C) and (D) are randomly sampled from the whole cohort. In (D), for each variant, a subset of individuals with random sample size between 110,000 and 120,086 was taken. Then, GWAS summary statistics were computed on the basis of the data from the unequal sample sizes. (E) The GWAS summary statistics are the same as in (D), but LD correlations are estimated from 9,617 TwinGene samples. (F) The GWAS summary statistics are the same as in (D), but LD correlations are estimated from 503 European ancestry samples in 1000 Genomes.

An Example Showing the Approximation of Summary-Level LASSO to Individual-Level LASSO The phenotype and genotype data are from 120,086 individuals in the UK Biobank. GWAS was performed on height. The curves represent regularization paths of Lasso coefficients. The six plots show LASSO results in six different scenarios of data: (A) LASSO based on individual level data. (B) LASSO based on GWAS summary statistics and LD correlations estimated from the whole cohort. (C) LASSO based on GWAS summary statistics and LD correlations estimated from a subcohort where n = 10,000. (D) LASSO based on unequal sample sizes, GWAS summary statistics, and LD correlations estimated from a subcohort where n = 10,000. The subcohorts in (C) and (D) are randomly sampled from the whole cohort. In (D), for each variant, a subset of individuals with random sample size between 110,000 and 120,086 was taken. Then, GWAS summary statistics were computed on the basis of the data from the unequal sample sizes. (E) The GWAS summary statistics are the same as in (D), but LD correlations are estimated from 9,617 TwinGene samples. (F) The GWAS summary statistics are the same as in (D), but LD correlations are estimated from 503 European ancestry samples in 1000 Genomes.

SOJO Shows High Sensitivity in Most Types of LD Structure

We simulated a model of two correlated causal variants in order to compare SOJO and GCTA-COJO in terms of sensitivity and specificity. The area under the curve (AUC) of SOJO is larger than that of GCTA-COJO in most cases (Figure 2). The only exception is when the directions of the two genetic effects do not agree with the sign of the LD correlation between these two variants, yet the LD is strong. Namely, and, at the same time, is large (Figure 2, bottom-left panel). The exception is relatively unlikely in practice. This can be verified by 147 loci with more than one height-associated variant reported in the GCTA-COJO analysis of GIANT data. If we focus on the first two significant height-associated variants, the top two variants for 24 out of the 147 loci have an absolute value of correlation larger than 0.2. Among these 24 loci, only seven have a discrepancy between the sign of the LD correlation and the directions of the two genetic effects. Therefore, in this case the exception rate is 7/147.

Figure 2

Receiver-Operating-Characteristic Curves Comparing the Performance of SOJO and GCTA-COJO for Correlated Causal-Variant Identification on Simulated Data

Datasets were simulated for 100,000 individuals with 20 variants, where . The allele frequencies are all equal to 0.5. To simplify the model, assuming genotype columns are demeaned, the trait , where are causal variants and . In all simulations, and . varies from 0.8 to 0.5 and 0.2. is either 1 or −1. For both SOJO and GCTA-COJO, the whole sample was taken as the reference sample. For each case, 200 datasets were generated. Solid curves represent SOJO, and dashed curves represent GCTA-COJO.

Receiver-Operating-Characteristic Curves Comparing the Performance of SOJO and GCTA-COJO for Correlated Causal-Variant Identification on Simulated Data Datasets were simulated for 100,000 individuals with 20 variants, where . The allele frequencies are all equal to 0.5. To simplify the model, assuming genotype columns are demeaned, the trait , where are causal variants and . In all simulations, and . varies from 0.8 to 0.5 and 0.2. is either 1 or −1. For both SOJO and GCTA-COJO, the whole sample was taken as the reference sample. For each case, 200 datasets were generated. Solid curves represent SOJO, and dashed curves represent GCTA-COJO.

Analysis of Three Anthropometric Traits

In this study, we took a subcohort of GIANT: the Swedish Twin Registry (TwinGene, n = 9,617) as our reference sample and focused on the 644,556 chip variants in TwinGene. Using GIANT summary statistics for height, BMI, and WHRadjBMI of 253,288, 322,154, and 210,088 individuals, and data for 120,286 individuals from UKB as a validation sample, we prioritized 8,470, 1,026, and 522 jointly associated variants by implementing SOJO on 423, 77, and 49 established loci for the three traits, respectively (Table S1). On average, 20, 13, and 11 variants were selected in each locus for height, BMI, and WHRadjBMI, respectively. In each locus, we performed summary-level LASSO and reported variants and their penalized effects when out-of-sample prediction was maximized in a validation sample. To assess the performance of SOJO and GCTA-COJO, we used for cumulative out-of-sample prediction as a criterion, where GIANT and TwinGene were used for discovery and UKB for validation. For each method, we first set a universal threshold for selection: a p value cut-off for GCTA-COJO, and the number of top variants for SOJO. We then implemented the method on each trait-associated locus reported by GIANT. For each locus, given the universal threshold, a set of candidate variants and their effects were computed. Using genotypes and estimated effects of these variants, we built a polygenic score and computed the proportion of predictable variance from the regional polygenic score in UKB. We then obtained cumulative out-of-sample prediction by summing all regional proportions of explained variance (Figure 3). By setting a fixed number of selected variants for all regions, SOJO still outperforms COJO in terms of prediction performance for all three traits. SOJO achieves maximum of 23.29%, 2.39%, and 2.18% for cumulative out-of-sample prediction when the regional degrees of freedom (described in the Material and Methods) are 29, 21, and 13 for height, BMI, and WHRadjBMI, respectively. The prediction performance of SOJO starts dropping after the regional degrees of freedom increases to 21 and 13 for BMI and WHRadjBMI, but does not drop for height even when the regional degrees of freedom increase to 25. This indicates that the allelic heterogeneity of height is the highest and that it is followed by BMI and WHRadjBMI for their established loci. This ranking is the same as the ranking of the estimated heritability23, 24 and the ranking of the number of loci detected in GIANT papers for the three traits.6, 7, 8 The same analysis was also performed with LD correlations estimated from the 503 European-ancestry samples in the 1000 Genomes Project phase 3 data, and the results are consistent (Figure 3).

Figure 3

Out-of-Sample Prediction Performance Comparison of SOJO and GCTA-COJO in Terms of Height, BMI, and WHRadjBMI

Solid curves represent SOJO, and dashed curves represent GCTA-COJO. The vertical dashed lines represent the average regional degrees of freedom when cumulative out-of-sample starts dropping. The x axis represents the average regional degrees of freedom, which is an estimate of the effective number of parameters in a model. For GCTA-COJO, the regional degrees of freedom are usually larger than the number of selected variants. But for SOJO, the regional degrees of freedom is equal to the number of selected variants (see Material and Methods).

Out-of-Sample Prediction Performance Comparison of SOJO and GCTA-COJO in Terms of Height, BMI, and WHRadjBMI Solid curves represent SOJO, and dashed curves represent GCTA-COJO. The vertical dashed lines represent the average regional degrees of freedom when cumulative out-of-sample starts dropping. The x axis represents the average regional degrees of freedom, which is an estimate of the effective number of parameters in a model. For GCTA-COJO, the regional degrees of freedom are usually larger than the number of selected variants. But for SOJO, the regional degrees of freedom is equal to the number of selected variants (see Material and Methods). When the variable selection thresholds were chosen as those maximizing locus-specific out-of-sample prediction , SOJO again achieved larger out-of-sample prediction than top-SNPs-only prediction and GCTA-COJO (Table 1). If we take height as an example, the maximum proportion of phenotypic variance captured by the 423 locus-specific polygenic scores is 22.7% for SOJO and 21.4% for optimized GCTA-COJO. Compared to 13.8% achieved by the use of top variants only, the amount achieved by SOJO represents an increase of 65% over the out-of-sample prediction . The amount of phenotypic variance captured by polygenic scores is consistent with the maximum cumulative out-of-sample prediction , which indicates that these polygenic scores are almost independent.

Table 1

Maximum Phenotypic Variance Explained by Optimized Polygenic Scores and the Maximum Cumulative Out-of-Sample Prediction for SOJO and GCTA-COJO in UKB

	Cumulative PredictionR2(%)				R2Explained by Polygenic Scores (%)
Trait	Top Variant	Standard COJO	Optimized COJO	SOJO	Top Variant	Standard COJO	Optimized COJO	SOJO

Height	14.35	17.38	23.42	24.52	13.76	16.71	21.42	22.70
BMI	1.88	1.99	2.42	2.52	1.84	1.94	2.35	2.46
WHRadjBMI	1.62	1.78	2.18	2.32	1.58	1.76	2.07	2.28

The for cumulative out-of-sample prediction was computed from a summation of all regional prediction . explained by polygenic scores were the amount of phenotypic variance that could be explained by all regional polygenic scores.

Top variant: only the top variant was selected. Standard COJO: variants selected by COJO with as the threshold. Optimized COJO: variants selected by COJO with threshold maximizing regional prediction . Coefficients of variants in each polygenic score were estimated by joint multiple regression in COJO. SOJO: variants selected by LASSO with tuning parameter maximizing regional prediction . Coefficients of variants in each polygenic score were determined by the LASSO result at the tuning parameter.

Maximum Phenotypic Variance Explained by Optimized Polygenic Scores and the Maximum Cumulative Out-of-Sample Prediction for SOJO and GCTA-COJO in UKB The for cumulative out-of-sample prediction was computed from a summation of all regional prediction . explained by polygenic scores were the amount of phenotypic variance that could be explained by all regional polygenic scores. Top variant: only the top variant was selected. Standard COJO: variants selected by COJO with as the threshold. Optimized COJO: variants selected by COJO with threshold maximizing regional prediction . Coefficients of variants in each polygenic score were estimated by joint multiple regression in COJO. SOJO: variants selected by LASSO with tuning parameter maximizing regional prediction . Coefficients of variants in each polygenic score were determined by the LASSO result at the tuning parameter.

SOJO Reveals Allelic Heterogeneity of Height

There are a number of reasons SOJO might achieve better prediction than GCTA-COJO. First, SOJO detects more underlying causal variants, whereas GCTA-COJO missed these causal variants because of the lower sensitivity under LD. Second, both methods detect variants tagging the same set of causal variants at a locus, but SOJO detects more variants capturing the information in these causal variants. Third, SOJO produces better effect estimates for prediction by using shrinkage estimators. In terms of biology, the first case is the most interesting. Therefore, we did a subsequent analysis to test the existence of the first scenario, i.e., that SOJO detected signals from additional causal variants. We performed this analysis on the 423 height loci by using individual-level data in UKB. In principle, if GCTA-COJO does not miss any causal variant, and if we perform LASSO for each locus by using cross-validation and set a fixed p value threshold, e.g., for GCTA-COJO, the ratio of the number of selected variants obtained with LASSO to that obtained with GCTA-COJO should not be affected by the allelic heterogeneity of the loci. However, higher allelic heterogeneity would lead to a larger possibility of the existence of correlated causal variants. If LASSO has a greater ability to detect causal variants in LD than GCTA-COJO, the ratio of the number of selected variants, i.e., the number of variants selected by LASSO/the number of variants selected by GCTA-COJO per locus, should increase with allelic heterogeneity. Because the allelic heterogeneity of each genetic locus is unknown and the genetic effects across the genome are very small, we used regional heritability as a proxy of allelic heterogeneity. The reasonability of as a proxy of allelic heterogeneity can be validated statistically: regional is significantly correlated with both the number of variants selected by LASSO and the number selected by GCTA-COJO (when is taken as threshold for GCTA-COJO). The correlation coefficients are 0.61 (p = 1.8 × 10−43) and 0.62 (p = 2.3 × 10−44). In terms of choosing a proper threshold for GCTA-COJO, a strict threshold will make LASSO results dominate the ratio, whereas a loose threshold will generate lots of noise (Figure S3). In our analysis, because there are 165 variants in each region on average, we chose as the cut-off, which is loose but still stricter than a 5% significance threshold after Bonferroni correction . The logarithm of the ratio increases significantly with regional (Figure 4) (slope of the regression = 1.21, p = 4.7 × 10−4), i.e., for a locus that has 0.1% more regional than another, the ratio is 1.13 times as large. This significantly positive slope suggests that GCTA-COJO missed some causal variants but that SOJO detected them or additional variants tagging them, and the amount is likely to be bigger when the allelic heterogeneity of the locus is larger. Therefore, the number of variants selected by SOJO is thus a better indicator of the locus-specific allelic heterogeneity. The same analysis was also done for BMI and WHRadjBMI. However, because the numbers of established loci are limited for these two traits, randomness dominated the correlation signal between the number of additional SOJO variants and regional (Figures S4 and S5).

Figure 4

The Ratio of the Number of Variants Selected by SOJO to That Selected by GCTA-COJO in Terms of Height in UKB Tends to Increase as Regional Increases

The plot is in logarithmic scale, and the y axis is labeled in the original scale. Regional is the multivariate regression when all variants at the locus are used. Each dot represents a locus. The red solid line represents the regression line in logarithmic scale. The gray shade denotes the 95% confidence interval for predicted mean values.

The Ratio of the Number of Variants Selected by SOJO to That Selected by GCTA-COJO in Terms of Height in UKB Tends to Increase as Regional Increases The plot is in logarithmic scale, and the y axis is labeled in the original scale. Regional is the multivariate regression when all variants at the locus are used. Each dot represents a locus. The red solid line represents the regression line in logarithmic scale. The gray shade denotes the 95% confidence interval for predicted mean values.

Discussion

We introduced a selection operator, SOJO, that analyzes multi-variant summary association statistics and is based on approximate LASSO shrinkage estimators. SOJO is more powerful than conditional and joint analysis in GCTA in terms of both discovery and prediction. SOJO is computationally fast because it is based on GWAS meta-analysis summary statistics and LD structure estimated from a reference cohort (Table S2). The small effects of genetic variants on complex traits imply that using estimates based on large-scale GWAS meta-analysis can substantially improve the precision of SOJO estimates, which provides a powerful tool for improving variant detection and better estimating genetic effects, especially in loci with LD. In future studies, SOJO might be useful for detecting more associated variants per locus in large-scale GWAS meta-analysis, providing better prediction based on detected loci, and suggesting allelic heterogeneity of complex traits. As in GCTA-COJO, the reference sample is assumed to be from the same population where the meta-analysis sample is from. Therefore, a subcohort involved in the meta-analysis is usually valid as a reference sample. However, an outside sample can also be a reference sample if it well represents the population of interest. The sample size of the reference sample should be large enough so that the LD correlations can be estimated accurately. According to a simulation result by Yang et al., a reference sample with more than 5,000 individuals is sufficient for achieving good accuracy. However, we were careful when using the estimated LD structure to get LASSO results: even though it is possible to implement SOJO on all the variants across the genome, we only applied it regionally. One main reason was that LASSO is more sensitive to the correlations between variants than COJO is, which is also why LASSO achieves better sensitivity and specificity when LD exists. Because of this characteristic, although the LASSO model can stably add top variants at the beginning of the selection procedure, as more and more variants are included, accumulated errors start disturbing the estimates. If SOJO is applied regionally, a relatively small publicly accessible sample such as 1000 Genomes can still be valid as a reference sample. In many regions, SOJO top variants were also selected by GCTA-COJO (Table S1). This is expected because both perform variable selection based on partial correlations. However, it is hard for GCTA-COJO to include more informative variants in its model, especially when the p value threshold is less stringent. The first problem is specificity. As we lower the threshold (and increase the p value threshold), COJO includes more noise than signals. Overfitting is the consequent second problem. Without shrinkage, noise degrades the prediction. But for LASSO, because of shrinkage estimation, both problems are less serious, so LASSO can utilize more information in a genomic locus to obtain better prediction performance as a reward of avoiding overfitting. When the underlying causal variant is multi-allelic (such as with a short-tandem-repeat variation) instead of biallelic, SOJO tends to select multiple tagging SNPs for the causal variant. By doing this, it can better tag the latent multi-allelic causal variant and improve the prediction performance (Figure S6). Evidence shows that jointly analyzing multiple correlated traits can improve both discovery power25, 26 and prediction performance. The possibility of extending LASSO to the multivariate context has been discussed in previous literature.28, 29 It is noteworthy that these multivariate LASSO methods, when applied on GWAS data, asymptotically only depend on (1) the LD correlation matrix, (2) the covariance structure between the phenotypes and genotypes, and (3) the covariance structure among the traits. These can all be estimated from summary association statistics and a reference cohort. Therefore, it is possible to extend SOJO to a multivariate framework in further studies. Here, we use UKB, an independent individual-level data sample, as the validation sample to determine a proper amount of regularization or a reasonable number of variants selected for each locus. If there is no available independent sample, we suggest the use of the reference sample as the validation sample. Although the reference sample is used for estimating LD structure and was included in the GWAS meta-analysis where the summary statistics were from, it can still function as a validation sample because it usually contributes only a little to the estimation of genetic effects. Our method can also be used directly with individual-level data where the reference sample is the whole cohort. In this case, SOJO is equivalent to standard LASSO that is based on individual-level data. If the summary statistics and LD structure have been computed and stored beforehand, SOJO is computationally faster than standard LASSO. Another benefit of SOJO is its ability to handle variants with unequal sample size. This means that when individuals or individual cohorts have missing genotype data, SOJO is able to take this into account to estimate correlations between the variants and the trait instead of removing valuable individuals because of missing data. According to our empirical results, human height not only is a highly polygenic trait but also has high allelic heterogeneity. This interesting coincidence might be due to assortative mating; i.e., individuals prefer partners with similar phenotypes. Assortative mating will increase the proportion of homozygous progeny and prevent the alleles of trait-associated variants from drifting away. A recent study inferred a correlation between trait-associated loci for height (0.200, 0.004 SE), BMI (0.143, 0.007 SE), and waist-to-hip ratio (0.101, 0.041 SE) in partners. This ranking is consistent with the ranking of allelic heterogeneity levels in our results for the three traits. The tuning parameter in LASSO is usually chosen by cross-validation, which is impossible for SOJO because the individual-level data of the GWAS meta-analysis are absent. Variant selection based on one validation sample might be less stable than standard cross-validation. Hence, SOJO could be improved by using the validation sample more thoroughly via splitting or bootstrapping. If one would like to avoid using the validation sample, but use only GWAS summary statistics and the reference sample, some additional methods are worthy of investigation. Although the individual-level data of the GWAS meta-analysis is absent, making it impossible to bootstrap the individuals, one could perform a parametric Monte Carlo simulation on the estimated genetic effects, given that their point estimates and standard errors are available from summary association statistics and their correlations can be estimated from the reference sample. With Monte Carlo simulation results, we can improve the variant detection and phenotypic prediction of SOJO further by implementing bootstrap-based methods such as stability selection or Bolasso. These methods might also be helpful for determining the LASSO tuning parameter when the validation sample is unavailable.

23 in total

1. Predicting survival from microarray data--a comparative study.

Authors: H M Bøvelstad; S Nygård; H L Størvold; M Aldrin; Ø Borgan; A Frigessi; O C Lingjaerde
Journal: Bioinformatics Date: 2007-06-06 Impact factor: 6.937

2. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies.

Authors: Brendan K Bulik-Sullivan; Po-Ru Loh; Hilary K Finucane; Stephan Ripke; Jian Yang; Nick Patterson; Mark J Daly; Alkes L Price; Benjamin M Neale
Journal: Nat Genet Date: 2015-02-02 Impact factor: 38.330

3. Widespread Allelic Heterogeneity in Complex Traits.

Authors: Farhad Hormozdiari; Anthony Zhu; Gleb Kichaev; Chelsea J-T Ju; Ayellet V Segrè; Jong Wha J Joo; Hyejung Won; Sriram Sankararaman; Bogdan Pasaniuc; Sagiv Shifman; Eleazar Eskin
Journal: Am J Hum Genet Date: 2017-05-04 Impact factor: 11.025

4. Regularized Multivariate Regression for Identifying Master Predictors with Application to Integrative Genomics Study of Breast Cancer.

Authors: Jie Peng; Ji Zhu; Anna Bergamaschi; Wonshik Han; Dong-Young Noh; Jonathan R Pollack; Pei Wang
Journal: Ann Appl Stat Date: 2010-03 Impact factor: 2.083

5. A trans-acting locus regulates an anti-viral expression network and type 1 diabetes risk.

Authors: Matthias Heinig; Enrico Petretto; Chris Wallace; Leonardo Bottolo; Maxime Rotival; Han Lu; Yoyo Li; Rizwan Sarwar; Sarah R Langley; Anja Bauerfeind; Oliver Hummel; Young-Ae Lee; Svetlana Paskas; Carola Rintisch; Kathrin Saar; Jason Cooper; Rachel Buchan; Elizabeth E Gray; Jason G Cyster; Jeanette Erdmann; Christian Hengstenberg; Seraya Maouche; Willem H Ouwehand; Catherine M Rice; Nilesh J Samani; Heribert Schunkert; Alison H Goodall; Herbert Schulz; Helge G Roider; Martin Vingron; Stefan Blankenberg; Thomas Münzel; Tanja Zeller; Silke Szymczak; Andreas Ziegler; Laurence Tiret; Deborah J Smyth; Michal Pravenec; Timothy J Aitman; Francois Cambien; David Clayton; John A Todd; Norbert Hubner; Stuart A Cook
Journal: Nature Date: 2010-09-08 Impact factor: 49.962

6. Variability in the heritability of body mass index: a systematic review and meta-regression.

Authors: Cathy E Elks; Marcel den Hoed; Jing Hua Zhao; Stephen J Sharp; Nicholas J Wareham; Ruth J F Loos; Ken K Ong
Journal: Front Endocrinol (Lausanne) Date: 2012-02-28 Impact factor: 5.555

7. A global reference for human genetic variation.

Authors: Adam Auton; Lisa D Brooks; Richard M Durbin; Erik P Garrison; Hyun Min Kang; Jan O Korbel; Jonathan L Marchini; Shane McCarthy; Gil A McVean; Gonçalo R Abecasis
Journal: Nature Date: 2015-10-01 Impact factor: 49.962

8. Gene regulatory network inference using fused LASSO on multiple data sets.

Authors: Nooshin Omranian; Jeanne M O Eloundou-Mbebi; Bernd Mueller-Roeber; Zoran Nikoloski
Journal: Sci Rep Date: 2016-02-11 Impact factor: 4.379

9. Genetic determination of height-mediated mate choice.

Authors: Albert Tenesa; Konrad Rawlik; Pau Navarro; Oriol Canela-Xandri
Journal: Genome Biol Date: 2016-01-19 Impact factor: 13.583

10. Defining the role of common variation in the genomic and biological architecture of adult human height.

Authors: Andrew R Wood; Tonu Esko; Jian Yang; Sailaja Vedantam; Tune H Pers; Stefan Gustafsson; Audrey Y Chu; Karol Estrada; Jian'an Luan; Zoltán Kutalik; Najaf Amin; Martin L Buchkovich; Damien C Croteau-Chonka; Felix R Day; Yanan Duan; Tove Fall; Rudolf Fehrmann; Teresa Ferreira; Anne U Jackson; Juha Karjalainen; Ken Sin Lo; Adam E Locke; Reedik Mägi; Evelin Mihailov; Eleonora Porcu; Joshua C Randall; André Scherag; Anna A E Vinkhuyzen; Harm-Jan Westra; Thomas W Winkler; Tsegaselassie Workalemahu; Jing Hua Zhao; Devin Absher; Eva Albrecht; Denise Anderson; Jeffrey Baron; Marian Beekman; Ayse Demirkan; Georg B Ehret; Bjarke Feenstra; Mary F Feitosa; Krista Fischer; Ross M Fraser; Anuj Goel; Jian Gong; Anne E Justice; Stavroula Kanoni; Marcus E Kleber; Kati Kristiansson; Unhee Lim; Vaneet Lotay; Julian C Lui; Massimo Mangino; Irene Mateo Leach; Carolina Medina-Gomez; Michael A Nalls; Dale R Nyholt; Cameron D Palmer; Dorota Pasko; Sonali Pechlivanis; Inga Prokopenko; Janina S Ried; Stephan Ripke; Dmitry Shungin; Alena Stancáková; Rona J Strawbridge; Yun Ju Sung; Toshiko Tanaka; Alexander Teumer; Stella Trompet; Sander W van der Laan; Jessica van Setten; Jana V Van Vliet-Ostaptchouk; Zhaoming Wang; Loïc Yengo; Weihua Zhang; Uzma Afzal; Johan Arnlöv; Gillian M Arscott; Stefania Bandinelli; Amy Barrett; Claire Bellis; Amanda J Bennett; Christian Berne; Matthias Blüher; Jennifer L Bolton; Yvonne Böttcher; Heather A Boyd; Marcel Bruinenberg; Brendan M Buckley; Steven Buyske; Ida H Caspersen; Peter S Chines; Robert Clarke; Simone Claudi-Boehm; Matthew Cooper; E Warwick Daw; Pim A De Jong; Joris Deelen; Graciela Delgado; Josh C Denny; Rosalie Dhonukshe-Rutten; Maria Dimitriou; Alex S F Doney; Marcus Dörr; Niina Eklund; Elodie Eury; Lasse Folkersen; Melissa E Garcia; Frank Geller; Vilmantas Giedraitis; Alan S Go; Harald Grallert; Tanja B Grammer; Jürgen Gräßler; Henrik Grönberg; Lisette C P G M de Groot; Christopher J Groves; Jeffrey Haessler; Per Hall; Toomas Haller; Goran Hallmans; Anke Hannemann; Catharina A Hartman; Maija Hassinen; Caroline Hayward; Nancy L Heard-Costa; Quinta Helmer; Gibran Hemani; Anjali K Henders; Hans L Hillege; Mark A Hlatky; Wolfgang Hoffmann; Per Hoffmann; Oddgeir Holmen; Jeanine J Houwing-Duistermaat; Thomas Illig; Aaron Isaacs; Alan L James; Janina Jeff; Berit Johansen; Åsa Johansson; Jennifer Jolley; Thorhildur Juliusdottir; Juhani Junttila; Abel N Kho; Leena Kinnunen; Norman Klopp; Thomas Kocher; Wolfgang Kratzer; Peter Lichtner; Lars Lind; Jaana Lindström; Stéphane Lobbens; Mattias Lorentzon; Yingchang Lu; Valeriya Lyssenko; Patrik K E Magnusson; Anubha Mahajan; Marc Maillard; Wendy L McArdle; Colin A McKenzie; Stela McLachlan; Paul J McLaren; Cristina Menni; Sigrun Merger; Lili Milani; Alireza Moayyeri; Keri L Monda; Mario A Morken; Gabriele Müller; Martina Müller-Nurasyid; Arthur W Musk; Narisu Narisu; Matthias Nauck; Ilja M Nolte; Markus M Nöthen; Laticia Oozageer; Stefan Pilz; Nigel W Rayner; Frida Renstrom; Neil R Robertson; Lynda M Rose; Ronan Roussel; Serena Sanna; Hubert Scharnagl; Salome Scholtens; Fredrick R Schumacher; Heribert Schunkert; Robert A Scott; Joban Sehmi; Thomas Seufferlein; Jianxin Shi; Karri Silventoinen; Johannes H Smit; Albert Vernon Smith; Joanna Smolonska; Alice V Stanton; Kathleen Stirrups; David J Stott; Heather M Stringham; Johan Sundström; Morris A Swertz; Ann-Christine Syvänen; Bamidele O Tayo; Gudmar Thorleifsson; Jonathan P Tyrer; Suzanne van Dijk; Natasja M van Schoor; Nathalie van der Velde; Diana van Heemst; Floor V A van Oort; Sita H Vermeulen; Niek Verweij; Judith M Vonk; Lindsay L Waite; Melanie Waldenberger; Roman Wennauer; Lynne R Wilkens; Christina Willenborg; Tom Wilsgaard; Mary K Wojczynski; Andrew Wong; Alan F Wright; Qunyuan Zhang; Dominique Arveiler; Stephan J L Bakker; John Beilby; Richard N Bergman; Sven Bergmann; Reiner Biffar; John Blangero; Dorret I Boomsma; Stefan R Bornstein; Pascal Bovet; Paolo Brambilla; Morris J Brown; Harry Campbell; Mark J Caulfield; Aravinda Chakravarti; Rory Collins; Francis S Collins; Dana C Crawford; L Adrienne Cupples; John Danesh; Ulf de Faire; Hester M den Ruijter; Raimund Erbel; Jeanette Erdmann; Johan G Eriksson; Martin Farrall; Ele Ferrannini; Jean Ferrières; Ian Ford; Nita G Forouhi; Terrence Forrester; Ron T Gansevoort; Pablo V Gejman; Christian Gieger; Alain Golay; Omri Gottesman; Vilmundur Gudnason; Ulf Gyllensten; David W Haas; Alistair S Hall; Tamara B Harris; Andrew T Hattersley; Andrew C Heath; Christian Hengstenberg; Andrew A Hicks; Lucia A Hindorff; Aroon D Hingorani; Albert Hofman; G Kees Hovingh; Steve E Humphries; Steven C Hunt; Elina Hypponen; Kevin B Jacobs; Marjo-Riitta Jarvelin; Pekka Jousilahti; Antti M Jula; Jaakko Kaprio; John J P Kastelein; Manfred Kayser; Frank Kee; Sirkka M Keinanen-Kiukaanniemi; Lambertus A Kiemeney; Jaspal S Kooner; Charles Kooperberg; Seppo Koskinen; Peter Kovacs; Aldi T Kraja; Meena Kumari; Johanna Kuusisto; Timo A Lakka; Claudia Langenberg; Loic Le Marchand; Terho Lehtimäki; Sara Lupoli; Pamela A F Madden; Satu Männistö; Paolo Manunta; André Marette; Tara C Matise; Barbara McKnight; Thomas Meitinger; Frans L Moll; Grant W Montgomery; Andrew D Morris; Andrew P Morris; Jeffrey C Murray; Mari Nelis; Claes Ohlsson; Albertine J Oldehinkel; Ken K Ong; Willem H Ouwehand; Gerard Pasterkamp; Annette Peters; Peter P Pramstaller; Jackie F Price; Lu Qi; Olli T Raitakari; Tuomo Rankinen; D C Rao; Treva K Rice; Marylyn Ritchie; Igor Rudan; Veikko Salomaa; Nilesh J Samani; Jouko Saramies; Mark A Sarzynski; Peter E H Schwarz; Sylvain Sebert; Peter Sever; Alan R Shuldiner; Juha Sinisalo; Valgerdur Steinthorsdottir; Ronald P Stolk; Jean-Claude Tardif; Anke Tönjes; Angelo Tremblay; Elena Tremoli; Jarmo Virtamo; Marie-Claude Vohl; Philippe Amouyel; Folkert W Asselbergs; Themistocles L Assimes; Murielle Bochud; Bernhard O Boehm; Eric Boerwinkle; Erwin P Bottinger; Claude Bouchard; Stéphane Cauchi; John C Chambers; Stephen J Chanock; Richard S Cooper; Paul I W de Bakker; George Dedoussis; Luigi Ferrucci; Paul W Franks; Philippe Froguel; Leif C Groop; Christopher A Haiman; Anders Hamsten; M Geoffrey Hayes; Jennie Hui; David J Hunter; Kristian Hveem; J Wouter Jukema; Robert C Kaplan; Mika Kivimaki; Diana Kuh; Markku Laakso; Yongmei Liu; Nicholas G Martin; Winfried März; Mads Melbye; Susanne Moebus; Patricia B Munroe; Inger Njølstad; Ben A Oostra; Colin N A Palmer; Nancy L Pedersen; Markus Perola; Louis Pérusse; Ulrike Peters; Joseph E Powell; Chris Power; Thomas Quertermous; Rainer Rauramaa; Eva Reinmaa; Paul M Ridker; Fernando Rivadeneira; Jerome I Rotter; Timo E Saaristo; Danish Saleheen; David Schlessinger; P Eline Slagboom; Harold Snieder; Tim D Spector; Konstantin Strauch; Michael Stumvoll; Jaakko Tuomilehto; Matti Uusitupa; Pim van der Harst; Henry Völzke; Mark Walker; Nicholas J Wareham; Hugh Watkins; H-Erich Wichmann; James F Wilson; Pieter Zanen; Panos Deloukas; Iris M Heid; Cecilia M Lindgren; Karen L Mohlke; Elizabeth K Speliotes; Unnur Thorsteinsdottir; Inês Barroso; Caroline S Fox; Kari E North; David P Strachan; Jacques S Beckmann; Sonja I Berndt; Michael Boehnke; Ingrid B Borecki; Mark I McCarthy; Andres Metspalu; Kari Stefansson; André G Uitterlinden; Cornelia M van Duijn; Lude Franke; Cristen J Willer; Alkes L Price; Guillaume Lettre; Ruth J F Loos; Michael N Weedon; Erik Ingelsson; Jeffrey R O'Connell; Goncalo R Abecasis; Daniel I Chasman; Michael E Goddard; Peter M Visscher; Joel N Hirschhorn; Timothy M Frayling
Journal: Nat Genet Date: 2014-10-05 Impact factor: 38.330

5 in total

1. A penalized regression framework for building polygenic risk models based on summary statistics from genome-wide association studies and incorporating external information.

Authors: Ting-Huei Chen; Nilanjan Chatterjee; Maria Teresa Landi; Jianxin Shi
Journal: J Am Stat Assoc Date: 2020-10-12 Impact factor: 5.033

2. A generalized model for combining dependent SNP-level summary statistics and its extensions to statistics of other levels.

Authors: Gulnara R Svishcheva
Journal: Sci Rep Date: 2019-04-02 Impact factor: 4.379

3. A novel classification method for NSCLC based on the background interaction network and the edge-perturbation matrix.

Authors: Yuan Tian; Caiqing Zhang; Wanru Ma; Alan Huang; Mei Tian; Junyan Zhao; Qi Dang; Yuping Sun
Journal: Aging (Albany NY) Date: 2022-04-09 Impact factor: 5.955

4. Determining Genetic Causal Variants Through Multivariate Regression Using Mixture Model Penalty.

Authors: V S Sundar; Chun-Chieh Fan; Dominic Holland; Anders M Dale
Journal: Front Genet Date: 2018-03-05 Impact factor: 4.599

5. Genome-wide association study of high-sensitivity C-reactive protein, D-dimer, and interleukin-6 levels in multiethnic HIV+ cohorts.

Authors: Brad T Sherman; Xiaojun Hu; Kanal Singh; Lillian Haine; Adam W Rupert; James D Neaton; Jens D Lundgren; Tomozumi Imamichi; Weizhong Chang; H Clifford Lane
Journal: AIDS Date: 2021-02-02 Impact factor: 4.632

5 in total