Literature DB >> 29218904

Evaluation of PrediXcan for prioritizing GWAS associations and predicting gene expression.

Binglan Li¹, Shefali S Verma, Yogasudha C Veturi, Anurag Verma, Yuki Bradford, David W Haas, Marylyn D Ritchie.

Abstract

Genome-wide association studies (GWAS) have been successful in facilitating the understanding of genetic architecture behind human diseases, but this approach faces many challenges. To identify disease-related loci with modest to weak effect size, GWAS requires very large sample sizes, which can be computational burdensome. In addition, the interpretation of discovered associations remains difficult. PrediXcan was developed to help address these issues. With built in SNP-expression models, PrediXcan is able to predict the expression of genes that are regulated by putative expression quantitative trait loci (eQTLs), and these predicted expression levels can then be used to perform gene-based association studies. This approach reduces the multiple testing burden from millions of variants down to several thousand genes. But most importantly, the identified associations can reveal the genes that are under regulation of eQTLs and consequently involved in disease pathogenesis. In this study, two of the most practical functions of PrediXcan were tested: 1) predicting gene expression, and 2) prioritizing GWAS results. We tested the prediction accuracy of PrediXcan by comparing the predicted and observed gene expression levels, and also looked into some potential influential factors and a filter criterion with the aim of improving PrediXcan performance. As for GWAS prioritization, predicted gene expression levels were used to obtain gene-trait associations, and background regions of significant associations were examined to decrease the likelihood of false positives. Our results showed that 1) PrediXcan predicted gene expression levels accurately for some but not all genes; 2) including more putative eQTLs into prediction did not improve the prediction accuracy; and 3) integrating predicted gene expression levels from the two PrediXcan whole blood models did not eliminate false positives. Still, PrediXcan was able to prioritize GWAS associations that were below the genome-wide significance threshold in GWAS, while retaining GWAS significant results. This study suggests several ways to consider PrediXcan's performance that will be of value to eQTL and complex human disease research.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2018 PMID： 29218904 PMCID： PMC5749400

Source DB: PubMed Journal: Pac Symp Biocomput ISSN： 2335-6928

1. Introduction

Genome-wide association studies (GWAS) have successfully identified disease susceptibility loci for complex traits. Yet, disease related loci discovered to date explain a small portion of the variance in disease risk[1]. It is not known whether the missing heritability is predominantly driven by variants with small effect sizes or by causal factors beyond genic regions. As a consequence, GWAS have relied on increasing sample size which increases the power to find disease-related loci and provides opportunities for rare variant analysis. However, analysis based on larger datasets consume an excessive amount of computational resources, which may not be available to everyone. The excessive number of single nucleotide polymorphism (SNP) loci in comparison to sample size leads to “the curse of dimensionality”[2]. Moreover, loci in intergenic regions may be robustly associated with complex traits, but the mechanisms behind such associations are generally not apparent. Researchers have been trying to integrate functional genomics into GWAS in the anticipation that mechanistic studies of complex diseases will be facilitated by better interpretation of identified associations[3-6]. Much attention has been paid to the study of regulatory elements that change genes’ transcriptional activities and consequently alter phenotypes. Expression quantitative trait loci (eQTLs) are one important class of such regulatory elements[7]. The Genotype-Tissue Expression (GTEx) Project[8] was initiated to identify a comprehensive set of eQTLs from different human tissues and their relationship to gene expression. PrediXcan[9] is a computational algorithm developed to exploit GTEx data, including eQTLs identification and their relationship to complex traits. PrediXcan evaluates the aggregate effects of cis-regulatory variants (within in 1MB upstream or downstream of genes of interest) on gene expression via an elastic net regression method, and consequently, PrediXcan may identify loci with modest to weak effect sizes that do not achieve significance in variant-based association studies. In theory, PrediXcan has a greatly reduced multiple testing burden as compared to single-variant-single-trait association tests. For example, given one trait and a genotypic dataset of 10 million SNPs, there are at most about 20,000 tests for PrediXcan (~20,000 genes), but 10 million tests for single-variant-single-trait association study. Putative eQTLs and their effect sizes on gene expression level in each GTEx tissue type are available online in PredictDB (http://predictdb.org/). Several cases have been recently identified where eQTLs are likely to play a causal role in disease by regulating gene expression[26,27]. But while more eQTLs have been identified in recent years, it remains challenging to prioritize the ‘true’ causal variants. Thus, as PrediXcan is designed to predict gene expression levels and prioritize GWAS results, PrediXcan can also be of great use for mechanistic studies. Here, PrediXcan performance was examined by two datasets where the PrediXcan whole blood models, the most similar tissue type to the samples, were used. One is the genotypic and transcriptomic data of the Yoruba (YRI) cohort from the 1000 Genomes Project[10]. While perhaps not the optimal dataset, it is very accessible which makes it convenient for readers to replicate this study. The other is based on the AIDS Clinical Trials Group (ACTG) protocol A5202[11,12,24], which we refer to as the A5202 cohort hereafter. A5202 cohort has a large enough sample size for evaluating the association tests (see methods) and has underwent a thorough variant-based association study[24] to compare with. To test prediction accuracy, PrediXcan’s predicted gene expression levels were compared to the actual gene expression levels measured in the YRI cohort. We also investigated possible influential factors and filter criterion to increase the possibility of identifying true predictions. As for GWAS prioritization, we carried out a transcriptome-wide association study (TWAS) based on PrediXcan predictions to obtain gene-trait associations and evaluate whether these associations prioritized the GWAS results. Our study provides insight into PrediXcan’s capabilities and more importantly eQTL relationships to molecular phenotypes and disease traits, which is of great value in studying transcriptional regulation and disease pathogenesis.

2. Methods & materials

2.1. Data preparation

The YRI cohort from the 1000 Genomes Project was used to evaluate the prediction accuracy of PrediXcan for gene expression levels. The YRI cohort comprises 75 individuals. All specimens and 4,395,198 variants passed genotype quality control (based on Hardy-Weinberg Equilibrium (P > 0.05) and minor allele frequency (MAF) > 5%). From these 75 individuals, gene expression levels of 23,723 genes in RPKM (Reads Per Kilobase of transcript per Million mapped reads) were provided by the 1000 Genomes Project. Another 1000 Genomes Project cohort, the Northern Europeans from Utah (CEU) cohort, was also included in this experiment to perform some components of the prediction accuracy test. The CEU cohort comprised 72 individuals and 3,660,275 variants after quality control (Hardy-Weinberg Equilibrium (P > 0.05) and MAF > 5%). But since the CEU cohort is part of the Depression Genes and Networks (DGN) cohort that was used to construct the DGN whole blood model by PrediXcan, we did not apply the DGN model to predict expression for the CEU cohort. This is the primary rationale for selecting the YRI cohort for our analyses. Genotypic and phenotypic data from the A5202 cohort (data based on ACTG protocol A5202[11,12,24]) were used to evaluate PrediXcan’s ability to prioritize GWAS results. The A5202 cohort comprises 47% European, 26% African, and 25% Hispanic Americans according to self-reported race or ethnicities. A5202 genotype and imputed data have been previously studied and reported[24]. Imputed genotypic data was quality checked using PLINK and non-ambiguous-stranded variants with imputation score > 0.7, MAF > 1%, and in Hardy-Weinberg Equilibrium (P > 0.05) were retained, resulting in 1221 individuals and 5,091,820 variants. Phenotypic data contained 690 continuous traits, which were based on laboratory assay results from HIV-infected patients before and after initiating antiretroviral therapy. The 690 traits were derived from plasma atazanavir pharmacokinetics, plasma efavirenz pharmacokinetics, change in CD4+ T-cell count, fasting low-density lipoprotein (LDL)-cholesterol, and fasting triglyceride data. Details about population structures, phenotypes, genotypes, and GWAS strategy are described elsewhere[24].

2.2. Heritability Estimation

To obtain the upper bound of how well a gene expression level can be predicted using genotypic data, we estimated the narrow-sense heritability between SNP variants and gene expression levels. Restricted maximum likelihood (REML) analysis was performed using GCTA[13] for each gene that is included in both the PrediXcan models and the YRI cohort’s gene expression data. Variant-gene relationships were retrieved from the weights table in the PrediXcan models so as to use the same exact set of variants for heritability and prediction accuracy estimations.

2.3. Performance of gene expression prediction

PrediXcan provides tissue-specific genotype-expression models, including 44 tissues from GTEx and 1 tissue (whole blood) from DGN[14]. As the 1000 Genomes project uses cultured cell lines derived from blood for genotypic and transcriptomic data, GTEx whole blood and DGN whole blood models were analyzed with the genotypic data from the YRI cohort to predict gene expression levels. The square of Pearson correlation (R2) between predicted and observed gene expression levels was calculated to measure prediction accuracy. To assess directionality, the Pearson Correlation Coefficient (PCC) between predicted and actual gene expression levels was calculated and is called directionality estimates in the following context. For example, PCC is positive when predicted and observed gene expression levels both increase or decrease at the same time; PCC is negative when the predicted and observed directions are discordant. Of note, some genes had flat predicted gene expression levels across individuals whose genotypes differed. Standard deviations for these predicted gene expression levels were 0, which forced these genes to be dropped from the prediction estimation using R2 or PCC. To test which factors influence PrediXcan’s prediction accuracy, we examined relationships between a few different model characteristics and accuracy estimates (R2). For each predicted gene expression level, we evaluated whether the prediction accuracy is influenced by the following model characteristics: 1) the number of variants, 2) the number of variants adjusted by gene length, 3) the percentage of variants over the number of all variants in a PrediXcan model used, and 4) choice of PrediXcan models (tissue specific models). Gene length was annotated using Biofilter[25].

2.4. Filtering for possibly more accurately predicted genes

In most experimental data analyses, we have either genotypic data or transcriptomic data, but not both, to perform GWAS or TWAS (see method 2.5. for details). Thus, it is unlikely that we can estimate prediction accuracy or genotype-expression heritability and accordingly select more accurately predicted genes for downstream analyses. To address this issue, we explored whether it is possible to filter the gene list for a subset of more accurately predicted genes without prior knowledge of actual gene expression levels. The filter criterion we tested was based on the similarity of the predicted gene expression levels from the two whole blood models, GTEx and DGN, as predictions from different models will be the easiest to obtain for every PrediXcan users. PCC was used to measure the similarity between prediction results.

2.5. GWAS Prioritization

In addition to predicting gene expression levels for individuals who have SNP data but no gene expression data, we also tested PrediXcan’s ability to prioritize GWAS results. Some SNP loci may be omitted from mechanistic studies because they only have modest to weak impact on traits and thus the association signals are not strong enough to pass the multiple testing thresholds set by GWAS or phenome-wide association studies (PheWAS[15]). We were interested in whether PrediXcan could prioritize such association signals. Thus, we carried out PrediXcan followed by TWAS and compared the association hits to PheWAS (since we had multiple phenotypes). To obtain gene-trait association p-values, PrediXcan GTEx whole blood model was applied to the genotypic data from ACTG A5202 to predict gene expression levels. Then predicted gene expression levels and 690 traits were used to perform phenome-wide TWAS via PLATO[16]. Sex, age, and the first three principal components were used as covariates to adjust for sampling biases and underlying population structure. As for variant-trait association studies, to reduce computation time and burden, we only explored the variants within and close to (1MB upstream or downstream) the PrediXcan-TWAS significant genes (Bonferroni-corrected P < 0.05). Filtering of variants was done using Biofilter[17]. The criterion of vicinity was in accordance with the region window used by PrediXcan for expression prediction. We then carried out PheWAS using PLATO on the PrediXcan significant traits and the variants nearby PrediXcan significant genes. The association p-values of PrediXcan-TWAS and PheWAS were visualized using ggplot2[18] in R.

3. Results

3.1. Prediction accuracy

Using the genotypic data of the YRI cohort, the PrediXcan DGN and GTEx tissue models predicted expression of 11,538 and 6,695 genes, respectively. Prediction performance was evaluated using PCC and R2 for 10,387 DGN genes and 6,127 GTEx genes, respectively (see method 2.3 for why some genes did not have estimates and the justification of using PCC and R2). Due to the finite number of genes that were common to both models and transcriptomic data, heritability estimation was limited to 4,711 genes. We first evaluated how well PrediXcan predictions capture the regulatory effects of variants on gene expression levels (Fig. 1). We found that genes with higher expression heritability were more likely to have higher R2 values than genes with lower expression heritability. These results are consistent with what has been published in initial PrediXcan paper[9]. In theory, the better PrediXcan performs at capturing additive regulatory effects imposed by variants, the closer h2 estimates (black line) and R2 (green dots) should be, which was what we observed for the genes whose expression levels were influenced by genetic factors (h2 > 0). These results (Fig. 1) suggest that PrediXcan predictions were able to capture the transcriptome/gene expression level variability.

Fig. 1

Prediction performance of DGN (A) and GTEx (B) whole blood tissue model on the YRI cohort. DGN and GTEx whole blood tissue models were applied to the genotypic data from the YRI cohort. Prediction accuracy (R2 of predicted versus observed gene expression levels; green) was compared to the narrow-sense heritability (h2) estimates (black).

We next sought out to evaluate PrediXcan’s prediction accuracy. We found that PrediXcan’s DGN and GTEx model had similar performance in predicting of gene expression. As indicated in the initial PrediXcan paper[9], PrediXcan precisely predicted gene expressions for some genes (DGN results shown in Fig. 2, GTEx results in supplementary figure 2), but prediction accuracy was overall unsatisfactory as most genes had accuracy estimates near 0 (Fig. 1). For the two whole blood models, the directionality estimates centered on zero with a small standard deviation, which suggested that most predicted gene expression levels did not correlate with the observed gene expression levels (Fig. 3). The GTEx model on the CEU cohort from 1000 Genomes Project performed similarly, with mean of −0.067 and variance of 0.03 (supplementary figure 3). In addition, for all three tests, about one-half of all predictions had negative correlation between predictions and observed values, which made interpretation difficult. In short, based on our evaluation, PrediXcan did not predict gene expression well when DGN and GTEx models were used as training sets to predict gene expression levels in YRI and CEU cohorts. While this finding may not be surprising, many researchers have assumed that PrediXcan could be used for this purpose. Thus, this examination was worthwhile.

Fig. 2

Examples of well-predicted genes. These plots show the top four performing genes based on PrediXcan’s prediction accuracy. Predicted gene expression levels were generated using the DGN whole blood model. Observed expression levels (in RPKM) for the YRI cohort were provided the 1000 Genome Project.

Fig. 3

Performance of prediction directionality of PrediXcan models, DGN (top) and GTEx (bottom), on the YRI cohort. Directionality was computed between predicted and observed gene expression levels.

Next, we examined factors that were responsible for predicting gene expression and more importantly which factors could improve the prediction performance of PrediXcan. We first evaluated whether prediction performance was dependent on specific model properties. For example, would prediction accuracy for a certain gene improve if more variants were included in the input genotypic data for expression prediction? To address this possibility, we explored the relationships between the prediction accuracy and three model properties: 1) the number of model variants used for prediction (Fig. 4A); 2) the percentage of the model variants used for prediction (Fig. 4B); and 3) the number of model variants used with adjustment for gene length (Fig. 4C). A slight improvement in prediction accuracy was apparent in these scatterplots when more variants were taken into account to predict gene expression levels. However, relationships were so weak that these model properties could not be used to favorably assess or improve PrediXcan’s prediction performance.

Fig. 4

Prediction accuracy has weak relationship to the model properties. R2 was computed between observed and GTEx whole blood model predicted expressions. A few genotype-expression model properties were explored, including the number (A) and the percentage (B) of model variants used for prediction, and the number of used model variants adjusted to gene length (C). But neither of them explained the unsatisfactory prediction, nor could be used as a filtering criterion.

Another potential filtering criterion, the similarity of predicted gene expression levels in the two whole blood models, was also explored. Blood is the most accessible tissue, which makes whole blood models of great practical value and their prediction accuracy critical. The fact that PrediXcan provided two whole blood tissue models offered the opportunity to examine the prediction results based on the two distinct model cohorts. If gene expression was truly regulated by genetic factors, then genotype-expression relationship would be captured regardless of the cohort, and predicted values should be the same given the same genotype data. With this assumption, we hypothesized that the predicted expression for a given gene would likely be more reliable and accurate if the predictions were similar in both whole blood models. As shown in Fig. 5A, we selected three sets of genes whose correlations between predicted expression were low, median, or high between the two models. If our hypothesis was correct, we would observe an increase of prediction accuracy from genes with low similarity to those with high similarity, which was indeed what we observed in Fig. 5B. The average of prediction accuracy increased from 0.023 to 0.084 for the DGN model and from 0.02 to 0.083 for the GTEx model. In effect, genes whose predicted expressions were more similar between models showed higher prediction accuracy in either PrediXcan whole blood model. However, the filtered results still contained genes whose predicted gene expression levels were directionally different from actual gene expression levels (figures not shown in this paper). In summary, similarity between models was a useful but not ideal filter criterion to improve prediction performance. However, the test of prediction similarity between models can be expanded to using models of different tissue types or using samples from different populations. It may also be worthwhile to investigate genes whose predictions are accurate and similar across models, which could be a good resource or reference set for future investigation of prediction accuracy. In short, many more evaluations could be done with the PrediXcan models or the underlying GTEx data to better understand the SNP-expression relationships in different populations, different tissues, and different genes.

Fig. 5

Prediction similarity between two models has weak, if any, indication on prediction accuracy. Prediction similarity was measured by the Pearson correlation of predicted expressions between the DGN and the GTEx model. (A) Distribution of prediction similarity. (B) Indication of prediction similarity on prediction accuracy. Prediction accuracy slightly, if any, increases when prediction similarity increases from the lowest to the highest.

3.2. Prioritizing GWAS results

We were also interested in evaluating another use of PrediXcan – prioritization of GWAS results. We wanted to determine whether PrediXcan-TWAS could prioritize important genetic associations that could not be identified by PheWAS due to biological or statistical limitations. To address this question, variant-trait associations that were located within 1MB upstream or downstream of genes were compared to the gene-trait associations identified by PrediXcan-TWAS, using data from the A5202 cohort (Fig. 6). Nineteen significant genes identified by PrediXcan-TWAS (P < 10−5) were all associated with triglyceride change from baseline to 24 or 48 weeks on treatment. For example, “tgch24_42” in Fig. 6A indicates the change in triglyceride from baseline (before starting HIV therapy) to week 24, and was the 42nd phenotype collected. Fig. 6A showed that if there were significant variant-trait associations, PrediXcan-TWAS was able to retain the significant signals (P < 10−5). This included 3 genes, DLEU7, DDX1, and NARF. On the other hand, PrediXcan-TWAS prioritized PheWAS associations that almost reached certain significance thresholds (P = 10−5; Fig. 6B). This highlighted 9 genes – GPN3, RAP1A, TTC8, SLC5A6, ELOVL7, SUMO1, BAIAP2, OCM, and SPRYD4. The remaining 7 genes had no GWAS association signals in the vicinity regions and thus were likely false positives. Loci within DLEU7, DDX1, RAP1A, TTC8, SLC5A6, SUMO1, and SPRYD4 were related to triglycerides in previous studies[19,20] according to GRASP[28]. DDX1 was reported to play a role in HIV-1 infection[21]. More studies are needed to see whether these genes are involved in changes in triglyceride levels on HIV therapy. Other identified genes did not have apparent connections with viral infections or triglycerides, but they could be disease related genes or simply genes that could help to fine-map causal genetic factors. In summary, we demonstrated the ability of PrediXcan to prioritize GWAS results, but the identified gene-trait associations warrant further investigation.

Fig. 6

PrediXcan is able to prioritize GWAS associations. ACTG A5202 imputed genotypic data after quality control was used as input for PrediXcan using GTEx whole blood model and followed by phenome-wide TWAS. Variants within 1MB upstream or downstream of PrediXcan-TWAS significant genes were used to carry out PheWAS. The figures showed the comparison of p-values between PrediXcan-TWAS associations (green line; grey shaded areas represent the size of genes) and PheWAS associations (black dots; blue and red lines denote the suggestive and genome-wide significant p-value, respectively). (A) PrediXcan-TWAS was able to replicate PheWAS results. (B) PrediXcan was able to prioritize non-significant PheWAS results.

4. Discussion

In this study, we carried out a preliminary investigation of the PrediXcan capabilities to predict gene expression levels and to prioritize GWAS signals. If PrediXcan accurately predicts gene expression from SNP data, there could be many potential uses of the algorithm such as imputation of missing transcriptomic data and exploring the biological mechanisms that link genotype to phenotype. But these future analyses are all contingent on the assumption that PrediXcan can accurately predict both the direction of a variants’ effect and levels of gene expression. We tested the prediction accuracy of the two PrediXcan whole blood models, DGN and GTEx. PrediXcan was able to accurately predict gene expression for some but not all genes. The slopes of correlation between predicted and actual gene expression levels were negative for almost one-half of genes. This limited the utility of PrediXcan as a transcriptomic data imputation/prediction tool. Several model properties that we explored failed to explain the suboptimal predictions. Dr. Im and her colleagues examined tissues from GTEx and DGN and the results suggested that the local architecture of gene expression traits is simple rather than polygenic[22]. In effect, gene expression is genetically regulated by few rather than multiple eQTLs. This simple local genetic architecture of gene expression might explain why including more putative eQTLs did not improve prediction accuracy in our study. Using prediction similarity between the two whole blood models as a filter improved prediction accuracy somewhat, but did not avoid the negative linear correlation between some predicted and observed gene expression levels. When it came to prioritizing genetic association study results, PrediXcan was able to identify genes that were not significant in GWAS, and also retained significant variant-trait associations. These results were reassuring of the utility of PrediXcan. PrediXcan possessed promising features to reduce research burden by focusing on genes instead of SNPs, and map regulatory effects of distant SNPs onto responding genes, which are overlooked by most studies where only genes adjacent to SNPs are investigated. Overall, the present study found that PrediXcan performed differently when evaluated for different functions. There are limitations to our study and PrediXcan models. First, whole blood itself is a heterogeneous tissue. And we applied the PrediXcan whole blood model to the YRI cohort whose transcriptomic data actually comes from immortalized blood cell lines. Second is the sample size and population specificity of the test cohort. The YRI cohort (75 individuals) was the most accessible cohort with both genotypic and transcriptomic data, but has a different population structure than the model cohorts from PrediXcan, either DGN or GTEx. While the GTEx cohort includes African Americans, the GTEx model did not yield better expression predictions. To better investigate the influence of population structure and sample size, we would need genotypic and transcriptomic data from multiple populations and of much larger sample sizes. If available, these datasets of different population background will also allow us to explore allelic heterogeneity and population-specific eQTLs. Third, we only evaluated the whole blood models. However, the trait of interest may be regulated by other tissue(s). For example, change of triglyceride in blood may be regulated by metabolism in liver. Thus, it is of biological interest and necessity to explore other tissue models to better understand the tissue specific SNP-expression-trait relationships in the future. Last but not least, PrediXcan is based on two assumptions, 1) loci are equivalent in their functional roles as potential eQTLs, despite the fact that loci at different functional regions may influence gene expression via different biological mechanisms; and 2) different alleles have the same effect on gene expression. Our study did not specifically evaluate these assumptions. Investigating the relationship of locus functional regions and their roles as eQTLs depends on more detailed annotation and categorization of different types of eQTLs. On the other hand, researchers have looked into allelic expression, which could be a future development for PrediXcan’s SNP-expression model design[23]. Although there are challenges, PrediXcan has illuminated a new path for GWAS – incorporating functional genomics and providing mechanistic insights for derived genetic associations. PrediXcan-TWAS results indicated that behind the association, a group of cis-eQTLs regulated gene expression and consequently affected the phenotype. More study is needed to assess PrediXcan’s ability to predict gene expression levels and prioritize GWAS results, which will hopefully further our understanding of relationships between eQTLs, gene expression levels, and phenotypes or disease traits.

26 in total

Review 1. Epigenetic modifications and human disease.

Authors: Anna Portela; Manel Esteller
Journal: Nat Biotechnol Date: 2010-10 Impact factor: 54.908

Review 2. Travelling the world of gene-gene interactions.

Authors: Kristel Van Steen
Journal: Brief Bioinform Date: 2011-03-26 Impact factor: 11.622

Review 3. The role of regulatory variation in complex traits and disease.

Authors: Frank W Albert; Leonid Kruglyak
Journal: Nat Rev Genet Date: 2015-02-24 Impact factor: 53.242

4. Tools and best practices for data processing in allelic expression analysis.

Authors: Stephane E Castel; Ami Levy-Moonshine; Pejman Mohammadi; Eric Banks; Tuuli Lappalainen
Journal: Genome Biol Date: 2015-09-17 Impact factor: 13.583

5. Abacavir-lamivudine versus tenofovir-emtricitabine for initial HIV-1 therapy.

Authors: Paul E Sax; Camlin Tierney; Ann C Collier; Margaret A Fischl; Katie Mollan; Lynne Peeples; Catherine Godfrey; Nasreen C Jahed; Laurie Myers; David Katzenstein; Awny Farajallah; James F Rooney; Belinda Ha; William C Woodward; Susan L Koletar; Victoria A Johnson; P Jan Geiseler; Eric S Daar
Journal: N Engl J Med Date: 2009-12-01 Impact factor: 91.245

6. GRASP: analysis of genotype-phenotype results from 1390 genome-wide association studies and corresponding open access database.

Authors: Richard Leslie; Christopher J O'Donnell; Andrew D Johnson
Journal: Bioinformatics Date: 2014-06-15 Impact factor: 6.937

7. Genomic variation. Impact of regulatory variation from RNA to protein.

Authors: Alexis Battle; Zia Khan; Sidney H Wang; Amy Mitrano; Michael J Ford; Jonathan K Pritchard; Yoav Gilad
Journal: Science Date: 2014-12-18 Impact factor: 47.728

8. Biological, clinical and population relevance of 95 loci for blood lipids.

Authors: Tanya M Teslovich; Kiran Musunuru; Albert V Smith; Andrew C Edmondson; Ioannis M Stylianou; Masahiro Koseki; James P Pirruccello; Samuli Ripatti; Daniel I Chasman; Cristen J Willer; Christopher T Johansen; Sigrid W Fouchier; Aaron Isaacs; Gina M Peloso; Maja Barbalic; Sally L Ricketts; Joshua C Bis; Yurii S Aulchenko; Gudmar Thorleifsson; Mary F Feitosa; John Chambers; Marju Orho-Melander; Olle Melander; Toby Johnson; Xiaohui Li; Xiuqing Guo; Mingyao Li; Yoon Shin Cho; Min Jin Go; Young Jin Kim; Jong-Young Lee; Taesung Park; Kyunga Kim; Xueling Sim; Rick Twee-Hee Ong; Damien C Croteau-Chonka; Leslie A Lange; Joshua D Smith; Kijoung Song; Jing Hua Zhao; Xin Yuan; Jian'an Luan; Claudia Lamina; Andreas Ziegler; Weihua Zhang; Robert Y L Zee; Alan F Wright; Jacqueline C M Witteman; James F Wilson; Gonneke Willemsen; H-Erich Wichmann; John B Whitfield; Dawn M Waterworth; Nicholas J Wareham; Gérard Waeber; Peter Vollenweider; Benjamin F Voight; Veronique Vitart; Andre G Uitterlinden; Manuela Uda; Jaakko Tuomilehto; John R Thompson; Toshiko Tanaka; Ida Surakka; Heather M Stringham; Tim D Spector; Nicole Soranzo; Johannes H Smit; Juha Sinisalo; Kaisa Silander; Eric J G Sijbrands; Angelo Scuteri; James Scott; David Schlessinger; Serena Sanna; Veikko Salomaa; Juha Saharinen; Chiara Sabatti; Aimo Ruokonen; Igor Rudan; Lynda M Rose; Robert Roberts; Mark Rieder; Bruce M Psaty; Peter P Pramstaller; Irene Pichler; Markus Perola; Brenda W J H Penninx; Nancy L Pedersen; Cristian Pattaro; Alex N Parker; Guillaume Pare; Ben A Oostra; Christopher J O'Donnell; Markku S Nieminen; Deborah A Nickerson; Grant W Montgomery; Thomas Meitinger; Ruth McPherson; Mark I McCarthy; Wendy McArdle; David Masson; Nicholas G Martin; Fabio Marroni; Massimo Mangino; Patrik K E Magnusson; Gavin Lucas; Robert Luben; Ruth J F Loos; Marja-Liisa Lokki; Guillaume Lettre; Claudia Langenberg; Lenore J Launer; Edward G Lakatta; Reijo Laaksonen; Kirsten O Kyvik; Florian Kronenberg; Inke R König; Kay-Tee Khaw; Jaakko Kaprio; Lee M Kaplan; Asa Johansson; Marjo-Riitta Jarvelin; A Cecile J W Janssens; Erik Ingelsson; Wilmar Igl; G Kees Hovingh; Jouke-Jan Hottenga; Albert Hofman; Andrew A Hicks; Christian Hengstenberg; Iris M Heid; Caroline Hayward; Aki S Havulinna; Nicholas D Hastie; Tamara B Harris; Talin Haritunians; Alistair S Hall; Ulf Gyllensten; Candace Guiducci; Leif C Groop; Elena Gonzalez; Christian Gieger; Nelson B Freimer; Luigi Ferrucci; Jeanette Erdmann; Paul Elliott; Kenechi G Ejebe; Angela Döring; Anna F Dominiczak; Serkalem Demissie; Panagiotis Deloukas; Eco J C de Geus; Ulf de Faire; Gabriel Crawford; Francis S Collins; Yii-der I Chen; Mark J Caulfield; Harry Campbell; Noel P Burtt; Lori L Bonnycastle; Dorret I Boomsma; S Matthijs Boekholdt; Richard N Bergman; Inês Barroso; Stefania Bandinelli; Christie M Ballantyne; Themistocles L Assimes; Thomas Quertermous; David Altshuler; Mark Seielstad; Tien Y Wong; E-Shyong Tai; Alan B Feranil; Christopher W Kuzawa; Linda S Adair; Herman A Taylor; Ingrid B Borecki; Stacey B Gabriel; James G Wilson; Hilma Holm; Unnur Thorsteinsdottir; Vilmundur Gudnason; Ronald M Krauss; Karen L Mohlke; Jose M Ordovas; Patricia B Munroe; Jaspal S Kooner; Alan R Tall; Robert A Hegele; John J P Kastelein; Eric E Schadt; Jerome I Rotter; Eric Boerwinkle; David P Strachan; Vincent Mooser; Kari Stefansson; Muredach P Reilly; Nilesh J Samani; Heribert Schunkert; L Adrienne Cupples; Manjinder S Sandhu; Paul M Ridker; Daniel J Rader; Cornelia M van Duijn; Leena Peltonen; Gonçalo R Abecasis; Michael Boehnke; Sekar Kathiresan
Journal: Nature Date: 2010-08-05 Impact factor: 49.962

9. Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals.

Authors: Alexis Battle; Sara Mostafavi; Xiaowei Zhu; James B Potash; Myrna M Weissman; Courtney McCormick; Christian D Haudenschild; Kenneth B Beckman; Jianxin Shi; Rui Mei; Alexander E Urban; Stephen B Montgomery; Douglas F Levinson; Daphne Koller
Journal: Genome Res Date: 2013-10-03 Impact factor: 9.043

10. A global reference for human genetic variation.

Authors: Adam Auton; Lisa D Brooks; Richard M Durbin; Erik P Garrison; Hyun Min Kang; Jan O Korbel; Jonathan L Marchini; Shane McCarthy; Gil A McVean; Gonçalo R Abecasis
Journal: Nature Date: 2015-10-01 Impact factor: 49.962

13 in total

1. Brain neurotransmitter transporter/receptor genomics and efavirenz central nervous system adverse events.

Authors: David W Haas; Yuki Bradford; Anurag Verma; Shefali S Verma; Joseph J Eron; Roy M Gulick; Sharon A Riddler; Paul E Sax; Eric S Daar; Gene D Morse; Edward P Acosta; Marylyn D Ritchie
Journal: Pharmacogenet Genomics Date: 2018-07 Impact factor: 2.089

2. Disentangling genetic feature selection and aggregation in transcriptome-wide association studies.

Authors: Chen Cao; Pathum Kossinna; Devin Kwok; Qing Li; Jingni He; Liya Su; Xingyi Guo; Qingrun Zhang; Quan Long
Journal: Genetics Date: 2022-02-04 Impact factor: 4.402

3. Associations of carotid intima media thickness with gene expression in whole blood and genetically predicted gene expression across 48 tissues.

Authors: Andy B Castaneda; Lauren E Petty; Markus Scholz; Rick Jansen; Stefan Weiss; Xiaoling Zhang; Katharina Schramm; Frank Beutner; Holger Kirsten; Ulf Schminke; Shih-Jen Hwang; Carola Marzi; Klodian Dhana; Adrie Seldenrijk; Knut Krohn; Georg Homuth; Petra Wolf; Marjolein J Peters; Marcus Dörr; Annette Peters; Joyce B J van Meurs; André G Uitterlinden; Maryam Kavousi; Daniel Levy; Christian Herder; Gerard van Grootheest; Melanie Waldenberger; Christa Meisinger; Wolfgang Rathmann; Joachim Thiery; Joseph Polak; Wolfgang Koenig; Jochen Seissler; Joshua C Bis; Nora Franceshini; Claudia Giambartolomei; Albert Hofman; Oscar H Franco; Brenda W J H Penninx; Holger Prokisch; Henry Völzke; Markus Loeffler; Christopher J O'Donnell; Jennifer E Below; Abbas Dehghan; Paul S de Vries
Journal: Hum Mol Genet Date: 2022-03-31 Impact factor: 5.121

4. PRECISION MEDICINE: FROM DIPLOTYPES TO DISPARITIES TOWARDS IMPROVED HEALTH AND THERAPIES.

Authors: Dana C Crawford; Alexander A Morgan; Joshua C Denny; Bruce J Aronow; Steven E Brenner
Journal: Pac Symp Biocomput Date: 2018

5. Influence of tissue context on gene prioritization for predicted transcriptome-wide association studies.

Authors: Binglan Li; Yogasudha Veturi; Yuki Bradford; Shefali S Verma; Anurag Verma; Anastasia M Lucas; David W Haas; Marylyn D Ritchie
Journal: Pac Symp Biocomput Date: 2019

6. Genome-wide analysis of dental caries and periodontitis combining clinical and self-reported data.

Authors: Dmitry Shungin; Simon Haworth; Kimon Divaris; Cary S Agler; Yoichiro Kamatani; Myoung Keun Lee; Kelsey Grinde; George Hindy; Viivi Alaraudanjoki; Paula Pesonen; Alexander Teumer; Birte Holtfreter; Saori Sakaue; Jun Hirata; Yau-Hua Yu; Paul M Ridker; Franco Giulianini; Daniel I Chasman; Patrik K E Magnusson; Takeaki Sudo; Yukinori Okada; Uwe Völker; Thomas Kocher; Vuokko Anttonen; Marja-Liisa Laitala; Marju Orho-Melander; Tamar Sofer; John R Shaffer; Alexandre Vieira; Mary L Marazita; Michiaki Kubo; Yasushi Furuichi; Kari E North; Steve Offenbacher; Erik Ingelsson; Paul W Franks; Nicholas J Timpson; Ingegerd Johansson
Journal: Nat Commun Date: 2019-06-24 Impact factor: 14.919

7. The transcriptome-wide association search for genes and genetic variants which associate with BMI and gestational weight gain in women with type 1 diabetes.

Authors: Agnieszka H Ludwig-Słomczyńska; Michał T Seweryn; Przemysław Kapusta; Ewelina Pitera; Urszula Mantaj; Katarzyna Cyganek; Paweł Gutaj; Łucja Dobrucka; Ewa Wender-Ożegowska; Maciej T Małecki; Paweł P Wołkow
Journal: Mol Med Date: 2021-01-20 Impact factor: 6.354

8. Combinatorial and statistical prediction of gene expression from haplotype sequence.

Authors: Berk A Alpay; Pinar Demetci; Sorin Istrail; Derek Aguiar
Journal: Bioinformatics Date: 2020-07-01 Impact factor: 6.937

9. Genotype-Based Gene Expression in Colon Tissue-Prediction Accuracy and Relationship with the Prognosis of Colorectal Cancer Patients.

Authors: Heike Deutelmoser; Justo Lorenzo Bermejo; Axel Benner; Korbinian Weigl; Hanla A Park; Mariam Haffa; Esther Herpel; Martin Schneider; Cornelia M Ulrich; Michael Hoffmeister; Jenny Chang-Claude; Hermann Brenner; Dominique Scherer
Journal: Int J Mol Sci Date: 2020-10-31 Impact factor: 5.923

Review 10. A Review of Integrative Imputation for Multi-Omics Datasets.

Authors: Meng Song; Jonathan Greenbaum; Joseph Luttrell; Weihua Zhou; Chong Wu; Hui Shen; Ping Gong; Chaoyang Zhang; Hong-Wen Deng
Journal: Front Genet Date: 2020-10-15 Impact factor: 4.599