Literature DB >> 28785111

Multi-polygenic score approach to trait prediction.

E Krapohl¹, H Patel^2,3, S Newhouse^2,3,4, C J Curtis^1,2, S von Stumm⁵, P S Dale⁶, D Zabaneh¹, G Breen^1,2, P F O'Reilly¹, R Plomin¹.

Abstract

A primary goal of polygenic scores, which aggregate the effects of thousands of trait-associated DNA variants discovered in genome-wide association studies (GWASs), is to estimate individual-specific genetic propensities and predict outcomes. This is typically achieved using a single polygenic score, but here we use a multi-polygenic score (MPS) approach to increase predictive power by exploiting the joint power of multiple discovery GWASs, without assumptions about the relationships among predictors. We used summary statistics of 81 well-powered GWASs of cognitive, medical and anthropometric traits to predict three core developmental outcomes in our independent target sample: educational achievement, body mass index (BMI) and general cognitive ability. We used regularized regression with repeated cross-validation to select from and estimate contributions of 81 polygenic scores in a UK representative sample of 6710 unrelated adolescents. The MPS approach predicted 10.9% variance in educational achievement, 4.8% in general cognitive ability and 5.4% in BMI in an independent test set, predicting 1.1%, 1.1%, and 1.6% more variance than the best single-score predictions. As other relevant GWA analyses are reported, they can be incorporated in MPS models to maximize phenotype prediction. The MPS approach should be useful in research with modest sample sizes to investigate developmental, multivariate and gene-environment interplay issues and, eventually, in clinical settings to predict and prevent problems using personalized interventions.

Entities: Chemical

Mesh：

Year: 2017 PMID： 28785111 PMCID： PMC5681246 DOI： 10.1038/mp.2017.163

Source DB: PubMed Journal: Mol Psychiatry ISSN： 1359-4184 Impact factor: 15.992

Introduction

Genome-wide association studies (GWASs) have been successful in identifying thousands of associations for hundreds of complex traits and common disorders.[1] One use of GWAS results is to understand biological pathways between genotypes and phenotypes. Another use, the focus of the present research, is to estimate genetic propensities of individuals to predict individuals’ future problems and potential and, eventually, to develop personalized interventions that meet individual medical, psychiatric and educational needs. Both goals have been hindered by the ubiquitous GWA finding that the largest effect sizes are extremely small.[2] For example, the largest population effect sizes found for common variants in height or body mass index (BMI) account for only ~1% of the variance.[3, 4] We know empirically that the vast majority of common genetic variants for most traits have a markedly lower effect than 1%.[2] The highly polygenic nature of complex traits and common disorders poses an immense challenge for understanding the biological mechanisms linking single variants with phenotypes. However, when the priority is phenotypic prediction, polygenic scores can be used to aggregate the effects of many DNA variants in order to investigate their joint predictive power.[5, 6] Rather than just using single-nucleotide polymorphisms (SNPs) that reach genome-wide significance, a recent development is to aggregate a much larger number of SNPs, weighted by their GWA effect size estimate, as long as together they increase the prediction in an independent sample, even if some SNPs have no real effect.[7] For example, for height, a polygenic score that aggregates the effects of ~2000 SNPs accounts for 21% of the variance of height in independent samples.[3] The other defining characteristic of complex traits and common disorders is the abundance of genetic correlations between them. There is consistent evidence for genetic correlations between psychiatric disorders, between anthropometric traits and between educational and cognitive traits, as well as for genetic correlations across these categories.[8, 9, 10, 11] Genetic correlation can arise from pleiotropy, the phenomenon of multiple traits being associated with the same gene or genetic variant.[8] Genetic correlation can also reflect shared biological pathways or more indirect linkage.[12] Regardless of its cause, genetic correlation between different traits means that a polygenic score based on one trait can predict a different outcome trait, with predictive accuracy a function of the shared genetic signal between them. Therefore, when the aim is prediction, genetic correlation can be exploited for trait prediction while remaining agnostic to the underlying mechanisms. A primary goal of polygenic scores, which aggregate the effects of thousands of trait-associated genetic variants discovered in GWAS, is to estimate individual-specific genetic propensities. This is typically achieved using a single polygenic score, but here we use an approach to increase predictive power by exploiting the joint power of multiple discovery GWASs. We use a multi-polygenic score (MPS) approach that exploits genetic correlations between the outcome trait and a multitude of traits by using the joint predictive power of multiple polygenic scores in one regression model. We selected GWASs from a centralized repository of summary statistics—based on their statistical power and regardless of prior evidence for association with the outcomes—to predict three core developmental outcomes in our independent target sample: educational achievement, BMI, and general cognitive ability. Using repeated cross-validation, we trained and validated the prediction models using elastic net regularized regression, a multiple regression model suited to deal with a large number of correlated predictors while preventing overfitting.[13] We subsequently tested how well these models predict outcomes in an independent test set. Here, we employ a MPS approach that uses publicly available GWAS summary statistics to estimate individual-level genetic propensities and predict developmental outcomes in an independent target sample. This stands in contrast to multi-trait approaches that rely on access to individual-level data in the discovery data sets because they make use of a method from animal breeding in which the total genetic effect (‘breeding value’) of each individual in a discovery data set is estimated from the best linear unbiased predictor in a multi-trait random-effects model that can be used for individual-level prediction in the validation data sets. These multi-trait methods are not applicable to GWAS summary statistics when genotype data are unavailable because of privacy or logistical constraints that are frequently the case. The declared aim of the current MPS approach is to maximize prediction of developmental outcomes, rather than investigating their etiology. This stands in contrast to multi-trait meta-analytic approaches of GWAS summary statistics that relies on substantial and consistent correlations between discovery GWASs and whose main aim is variant discovery.[14, 15, 16, 17] The current MPS approach allows for, but does not require, correlation among polygenic predictors.

Materials and methods

Sample

The target sample comprised genome-wide SNP and phenotypic data from 6710 unrelated adolescents drawn from the UK representative Twins Early Development Study (TEDS). TEDS is a multivariate longitudinal study that recruited over 11 000 twin pairs born in England and Wales in 1994, 1995 and 1996. Both the overall TEDS sample and the genotyped subsample have been shown to be representative of the UK population.[18, 19, 20] The project received approval from the Institute of Psychiatry ethics committee (05/Q0706/228) and parental consent was obtained before data collection. We processed the genotypes for the 6710 individuals using stringent quality control procedures followed by imputation of SNPs using the Haplotype Reference Consortium reference panel[21] (Supplementary Methods S1).

Predictors

Discovery data sets: GWAS summary statistics

We selected GWAS summary statistics from LD hub, a centralized repository for summary statistics[22] based on their statistical power—regardless of prior evidence for association with our outcome traits. Specifically, we included 81 GWAS summary statistics that were either publically downloadable or obtained via correspondence and had a linkage disequilibrium (LD) score[23] heritability z-score >5, indexing good statistical power (which is a function of variance explained and sample size). Supplementary Table S1 provides details of all GWAS summary statistics included in our analyses. The published version of the child IQ GWAS included the present target sample of TEDS. Therefore, to avoid bias, the present analyses used summary statistics from a rerun of the GWAS meta-analysis excluding TEDS.

Polygenic scores

We created 81 genome-wide polygenic scores for each of the 6710 individuals in the TEDS sample using summary statistics from the GWAS described above (Supplementary Table S1). After quality control (Supplementary Methods S1), the study data included 7 581 516 genotyped or well-imputed (info >0.70) SNPs. These were quality controlled and coordinated with each of the summary statistics, respectively, by excluding markers due to nucleotide inconsistencies or low minor allele frequency (<1%). Number of markers before and after quality control and coordination with the study data are listed in Supplementary Table S1. We constructed polygenic scores as the weighted sums of each individual’s trait-associated alleles across all SNPs. We used LDpred[24] to construct the scores. LDpred uses a prior on the markers’ effect sizes and adjusts summary statistics for LD between markers. Scores were standardized and adjusted for 30 principal components. More details on the construction of the polygenic scores are provided in Supplementary Methods S2.

Outcomes

To illustrate the MPS approach, we selected three key developmental outcomes: Educational achievement operationalized as the mean grade of the three compulsory subjects (Mathematics, English and Science) attained on the standardized United Kingdom General Certificate of Secondary Education (GCSE), taken by almost all (>99%) pupils at the end of compulsory education at age 16 years. General cognitive ability at age 12 years assessed by two verbal and two nonverbal cognitive standardized tests. BMI at age 9 years that was age and sex adjusted using external reference data. Supplementary Methods S3 and Figure S1 contain detailed descriptions of the three measures.

Models

Single-polygenic score models

To estimate the separate prediction of each predictor, we fit a series of simple linear regression models for each of the 81 polygenic scores and each of the 3 outcomes. For each GWAS-outcome combination, three models were run using polygenic scores created with Gaussian mixture weights of 1, 0.1 and 0.01, respectively. The model that explained the most variance in the outcome (that is, largest cross-validated R2 in training data) was then entered into the multi-score model. These simple linear regression models were fit and validated in repeated 10-fold cross-validation (see section below for details) using the lm function implemented within the caret R package.[25] Based on consistent evidence for extensive genetic correlations across complex traits and disorders, rather than summing up, the predictions of the single-score models were expected to substantially overlap.

MPS models

We used the MPS model to estimate the joint prediction of the 81 polygenic scores as well as the ranking of predictors by the magnitude of their contribution to predicting the outcome. Conventional multiple linear regression models in the presence of a large number of predictors are subject to overfitting, and stepwise regression suffers from upward-biased coefficients and R2 (see, for example, Tibshirani[26]). We used elastic net regularized regression[13] to predict outcomes and by selecting predictors and estimating their contribution to the prediction. Regularized regression models are general linear models that employ strict penalties to prevent overfitting. Elastic net allows for estimating the joint predictive ability of a large number of variables while preventing overfitting. Elastic net uses a linear combination of two regularization techniques, L2 regularization (used in ridge regression) and L1 regularization (used in LASSO (least absolute shrinkage and selection operator)) by simultaneously implementing variable selection (that is, dropping/retaining variables) and continuous shrinkage (that is, penalizing coefficients for overfitting); and it efficiently deals with multicollinearity by selecting or dropping groups of correlated variables.[13, 27] Elastic net overcomes the limitation of LASSO that tends to select one variable from a group of correlated predictors and to ignore the others. In situations where predictors are non-independent or correlated (for example, sharing genetic signal or discovery cohorts) the elastic net has the advantage of including automatically all the highly correlated variables in the group (grouping effect).[13, 27, 28] Final model coefficients are analogous to a conventional multiple linear regression output that allows for a ranking of predictors by the magnitude of their contribution to predicting the outcome. Overall variance explained by the model is indexed by the coefficient of determination, R2. We used glmnet R package[15, 16, 17] implemented within caret R package[25] to conduct a series of linear elastic net regularized regressions and select polygenic predictors leading to an optimized final model for each outcome. Elastic net regularized regression employs two hyperparameters, alpha and lambda.[13] As recommended to achieve optimized balance between variance explained and minimum bias, we fit models to tune over both alpha and lambda parameter values in repeated 10-fold cross-validation.[29].

Model training and testing

Generally, a predictive model is considered powerful when the model is capable of predicting outcomes in ‘unseen’ data with high accuracy. The performance of a model can therefore be evaluated by testing how well it predicts phenotypes of individuals whose data were not included in the construction of the prediction model. Each model described in the preceding section was trained and tested using the following three-step strategy: Data splitting. We randomly split the data set into a separate training set and test set (60% train, 40% test). Model training. We used repeated cross-validation on the training set to train and optimize the model via validation. Model testing and comparison. We applied the final model to the independent test set to obtain an unbiased estimate of model performance.

Model training

The training set was used to train and validate the model, this included hyperparameter tuning for the elastic net models. In order to optimize the balance between variance explained and minimum bias, we tested each model in 10-fold cross-validation with resampling.[29] We split the training data randomly into 10 equal-sized subsets, using 9 subsets to train the model and the remaining subset as validation. The cross-validation process was repeated 10 times, with each of the 10 subsamples used once as the validation data. Although cross-validation has been shown to produce nearly unbiased estimates of accuracy, variability of these estimates can be reduced by bootstrap methods, wherein available data are repeatedly sampled with replacement in order to mimic the drawing of future random sampling.[30, 31] Therefore, to minimize variation across validation data sets, we repeated the 10-fold cross-validation 100 times with random data set partitions.[32] The optimized or ‘final’ model is chosen based on the largest performance value (or smallest mean squared error). Predictors retained within the model and standardized coefficients index whether, and to what extent, they contribute to predicting the outcome. Model performance for the repeated cross-validation in the training set was summarized as mean-cv-R2train from the resampling distribution.

Model testing and comparison

To obtain unbiased estimates of model performance, we used the parameters from the final model obtained from the repeated cross-validation in the training set to predict outcomes (that is, educational achievement, BMI and general cognitive ability) in the independent test set. To index prediction accuracy, we used the coefficient of determination, in the following referred to as R2test. Differences between mean-cv-R2train and R2test provide an index of out-of-sample error. We used permutation to test the statistical significance of the difference in predictions between the MPS and the best single-score model. To test the null hypothesis of exchangeability of models, H0: MPSR2test=best-single-scoreR2test, we compared the observed diffR2test (MPSR2test – best-single-scoreR2test) against an empirical null distribution of no difference in predictions between the MPS and the best single-score model. We tested the exchangeability of models by randomly selecting either the MPS or the best single-score model to generate predictions. We then calculated the difference in R2 for two models with shuffled predictions. The process was repeated 100 000 times, generating an empirical null distribution of diffR2 under exchangeability of model predictions. If the null hypothesis of no difference between models is true, it would not matter if we randomly exchange the model used for generating predictions. However, if the observed diffR2test value falls outside of those obtained when randomly exchanging models, this represents evidence against the null hypothesis of no difference in prediction between models. The statistical significance, as expressed in an empirical P-value, is calculated as the fraction of permutation values that are at least as extreme as the original diffR2test statistic observed in nonpermuted data.

Results

MPS predictions

The MPS models showed better prediction in the independent test set than the best single-score models. The best single-score models were the large 2016 GWAS of years of education predicting 9.8% of the variance in educational achievement and 3.6% in general cognitive ability in the test set. For BMI, Obesity class 1 achieved the best single-score prediction, explaining 3.8% of the variance. (See Supplementary Table S2 for full single-score models results; see Supplementary Figure S2 for a visual overview of the single-score model results.) The MPS models explained 10.9% variance in educational achievement, 4.8% in cognitive ability and 5.4% in BMI in the test set. The improvement in variance explained compared with the best single-score models was 1.1% (P=4e−03), 1.1% (P=2e−03) and 1.6% (P=1e−04), respectively. Figures 1a–c show the polygenic predictors selected during training of the MPS models and their standardized coefficients. The ranking of predictors provides an index for their contributions to prediction. Analogous to conventional multiple regression, a standardized coefficient represents the contribution of the predictor to the outcome when adjusting for all other variables in the model.

Figure 1

(a) Multi-polygenic score (MPS) model predicting educational achievement. Standardized coefficients of polygenic predictors selected by elastic net via repeated cross-validation in training set. Analogous to conventional multiple regression, a standardized coefficient represents the contribution of the predictor to the outcome when adjusting for all other variables in the model. The mean variance explained of the resampling distribution from the cross-validation was mean-cv-R2train=0.12. The out-of-sample prediction of the model was R2test=0.109. (b) MPS model predicting general cognitive ability. Standardized coefficients of polygenic predictors selected by elastic net via repeated cross-validation in training set. Analogous to conventional multiple regression, a standardized coefficient represents the contribution of the predictor to the outcome when adjusting for all other variables in the model. The mean variance explained of the resampling distribution from the cross-validation was mean-cv-R2train=0.051. The out-of-sample prediction of the model was R2test=0.048. (c) MPS model predicting body mass index (BMI). Standardized coefficients of polygenic predictors selected by elastic net via repeated cross-validation in training set. Analogous to conventional multiple regression, a standardized coefficient represents the contribution of the predictor to the outcome when adjusting for all other variables in the model. The mean variance explained of the resampling distribution from the cross-validation was mean-cv-R2train=0.074. The out-of-sample prediction of the model was R2test=0.054.

The model predicting educational achievement retained 12 polygenic predictors (Figure 1a). Cognitive and socioeconomic polygenic scores took the top ranks. However, the psychiatric cross-disorder polygenic score, which aggregates genetic risk for bipolar disorder, schizophrenia, major depressive disorder, autism and attention deficit hyperactivity disorder, and the score for depressive symptoms in the general population were also retained by the model. The scores for Homeostasis Model Assessment of β-cell function, an index of β-cell function, and for coronary artery disease also contributed to prediction of educational achievement. The MPS model predicting cognitive ability selected 10 polygenic scores during cross-validation (Figure 1b). The strongest contributions to prediction came from cognitive and socioeconomic variables. Contributions from the psychiatric realm came from major depressive disorder, autism spectrum disorder and bipolar disorder, with the latter two having positive association with cognitive ability. The MPS model predicting BMI retained 28 polygenic scores (Figure 1c). The top three strongest predictions came from obesity-related variables. Ranks four and five were taken by coronary artery disease and age at menarche (negative association). The sixth strongest predictor for children’s BMI was the polygenic score based on the GWAS of mean caudate nucleus volume that plays a role in various non-motor functions including procedural and associative learning and inhibitory action control.[33, 34, 35, 36] Other predictors included ulcerative colitis, leptin and neuroticism.

Stratification by MPS

We examined the phenotypic values by quantile of the MPS distribution. Figures 2a–c plot the observed outcomes against the predictions by the MPS model in the test set. In general, the quantile results were roughly linear.

Figure 2

(a) Educational achievement by multi-polygenic score (MPS) deciles. Observed mean grade (across the three subjects Mathematics, English and Science) by deciles of the MPS predictions in the test set. Bars represent 95% confidence estimates. (b) General cognitive ability by MPS deciles. Observed mean standardized general cognitive ability by deciles of the MPS predictions in the test set. Bars represent 95% confidence estimates. (c) Body mass index (BMI) by MPS deciles. Observed mean standardized BMI (age and sex adjusted by external reference) by deciles of the MPS predictions in the test set. Bars represent 95% confidence estimates.

Figure 2a shows quantile results for mean exam grades. Individuals in the top 10% of the MPS distribution on average achieved an ‘A’ mean grade (across the three subjects Mathematics, English and Science), whereas individuals in the bottom 10% MPS distribution achieved a ‘C’ mean grade on average (top 10% mean=9.74; bottom 10% mean=8.33 (11=A*,10=A, 9=B, 8=C, 7=D, 6=E, 5=F, 4=G, 0=failed). Cohen’s d was 1.20 (95% confidence interval 0.99–1.41) suggesting that 88% of the top 10% MPS group had a mean grade above that of the bottom 10% group, and there is an 80% probability that a person picked at random from the top 10% MPS group will have a higher score than a person picked at random from the bottom 10% group.[37, 38] For cognitive ability, Figure 2b illustrates that individuals in the top 10% of the MPS distribution on average had a standardized cognitive ability score over 0.64 (95% confidence interval 0.40–0.89) s.d. higher than those in the bottom 10% MPS distribution. This means that 74% in the top 10% MPS group had mean ability score above that of the bottom 10% group, and that there is a 67% probability that a person picked at random from the top 10% MPS group will have a higher score than a person picked at random from the bottom 10% group. For BMI, Figure 2c shows that children in the top 10% of the MPS distribution on average had a 0.80 (95% confidence interval 0.57–1.03) s.d. higher than those in the bottom 10% MPS distribution. Expressed differently, 79% of children in the top 10% MPS group had a mean ability score above that of the bottom 10% group, and that there is a 71% probability that a person picked at random from the top 10% MPS group will have a higher score than a person picked at random from the bottom 10% group.

Discussion

We demonstrate that the MPS approach that combines summary-level GWAS data from multiple traits yields better individual-level phenotype prediction than single-score predictor models in independent test data. The observation that a multitude of polygenic scores contribute to trait prediction in the MPS models highlights the complexity of the system being studied and the somewhat arbitrary way we divide it into phenotypic characteristics. We show that polygenic variation associated with traits other than the to-be-predicted outcome contributes to prediction. For instance, although there is a known association between ulcerative colitis and BMI,[39] genetic variants associated with ulcerative colitis are not typically included in models estimating individuals’ genetic risk for increased BMI. The predictors selected and coefficients estimated by the MPS models in the current study can be used to generate individual-specific composite estimates of genetic propensities in other and smaller samples. For a more parsimonious replication, future research in other samples could construct a simple multiple regression model using the top five predictors selected by the current analyses. The predictive power of such an MPS model can then be compared with that of the best single-score model. More generally, in addition to the likely improvement in MPS prediction as more and larger GWASs are being published, the MPS approach has the potential to be applied to a wide range of outcomes and samples, including psychiatric and medical outcomes in case–control samples. The predictive power of a polygenic score is not only a function of the genetic correlation between discovery and outcome trait, but also of the statistical power present in the discovery GWAS on which it is based (that is, variance explained and sample size).[5] The MPS approach exploits the fact that even GWASs of genetically distantly related traits might contribute predictive power if their power is superior to GWASs of more proximal traits. For instance, most likely because of its much greater sample size, the years of education polygenic score predicted general cognitive ability better than any of the polygenic scores based on GWASs directly measuring general cognitive ability. Because predictive power of polygenic scores does not simply reflect the genetic correlation between discovery and target trait, but depends on the genetic architecture of both traits and sample size (especially of the discovery sample),[5, 6, 40] the MPS approach is not suited for investigating etiology. Other methods have been developed to that end. For instance, multivariate twin studies are appropriate for investigating trait etiology, or multi-trait GWAS meta-analysis aims to disentangle effects of correlated traits at the level of genetic variants.[15, 16, 41, 42, 43, 44, 45] In contrast, the declared aim of the MPS approach is to maximize trait prediction, without assumptions about the relationships among predictors. The MPS approach will be useful whenever trait prediction is a priority. The primary reason for maximizing predictive power using the MPS approach is to predict phenotypes of individuals with as much accuracy as possible. Individual-specific genetic predictions will be useful in research with modest sample sizes to investigate developmental, multivariate and gene–environment interplay issues. Eventually, MPS models could be useful in both society and science to estimate genetic potential as well as risk in relation to all domains of functioning, including cognitive abilities and disabilities, personality and health and illness. This predictive power will raise concerns about potential early, even prenatal, prediction. It is important to begin discussions that are informed by the empirical data because genotype-based trait prediction is moving towards the point of practical relevance. Although concerns are warranted, these might be outweighed by the benefits that could result from being able to predict problems and potential early and develop stratified preventions and interventions accordingly.

34 in total

1. A probability-based measure of effect size: robustness to base rates and other factors.

Authors: John Ruscio
Journal: Psychol Methods Date: 2008-03

2. Meta-analysis of correlated traits via summary statistics from GWASs with an application in hypertension.

Authors: Xiaofeng Zhu; Tao Feng; Bamidele O Tayo; Jingjing Liang; J Hunter Young; Nora Franceschini; Jennifer A Smith; Lisa R Yanek; Yan V Sun; Todd L Edwards; Wei Chen; Mike Nalls; Ervin Fox; Michele Sale; Erwin Bottinger; Charles Rotimi; Yongmei Liu; Barbara McKnight; Kiang Liu; Donna K Arnett; Aravinda Chakravati; Richard S Cooper; Susan Redline
Journal: Am J Hum Genet Date: 2014-12-11 Impact factor: 11.025

3. Regularization Paths for Generalized Linear Models via Coordinate Descent.

Authors: Jerome Friedman; Trevor Hastie; Rob Tibshirani
Journal: J Stat Softw Date: 2010 Impact factor: 6.440

4. Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores.

Authors: Bjarni J Vilhjálmsson; Jian Yang; Hilary K Finucane; Alexander Gusev; Sara Lindström; Stephan Ripke; Giulio Genovese; Po-Ru Loh; Gaurav Bhatia; Ron Do; Tristan Hayeck; Hong-Hee Won; Sekar Kathiresan; Michele Pato; Carlos Pato; Rulla Tamimi; Eli Stahl; Noah Zaitlen; Bogdan Pasaniuc; Gillian Belbin; Eimear E Kenny; Mikkel H Schierup; Philip De Jager; Nikolaos A Patsopoulos; Steve McCarroll; Mark Daly; Shaun Purcell; Daniel Chasman; Benjamin Neale; Michael Goddard; Peter M Visscher; Peter Kraft; Nick Patterson; Alkes L Price
Journal: Am J Hum Genet Date: 2015-10-01 Impact factor: 11.025

5. Genetic link between family socioeconomic status and children's educational achievement estimated from genome-wide SNPs.

Authors: E Krapohl; R Plomin
Journal: Mol Psychiatry Date: 2015-03-10 Impact factor: 15.992

6. PRSice: Polygenic Risk Score software.

Authors: Jack Euesden; Cathryn M Lewis; Paul F O'Reilly
Journal: Bioinformatics Date: 2014-12-29 Impact factor: 6.937

7. An efficient Bayesian meta-analysis approach for studying cross-phenotype genetic associations.

Authors: Arunabha Majumdar; Tanushree Haldar; Sourabh Bhattacharya; John S Witte
Journal: PLoS Genet Date: 2018-02-12 Impact factor: 5.917

8. A reference panel of 64,976 haplotypes for genotype imputation.

Authors: Shane McCarthy; Sayantan Das; Warren Kretzschmar; Olivier Delaneau; Andrew R Wood; Alexander Teumer; Hyun Min Kang; Christian Fuchsberger; Petr Danecek; Kevin Sharp; Yang Luo; Carlo Sidore; Alan Kwong; Nicholas Timpson; Seppo Koskinen; Scott Vrieze; Laura J Scott; He Zhang; Anubha Mahajan; Jan Veldink; Ulrike Peters; Carlos Pato; Cornelia M van Duijn; Christopher E Gillies; Ilaria Gandin; Massimo Mezzavilla; Arthur Gilly; Massimiliano Cocca; Michela Traglia; Andrea Angius; Jeffrey C Barrett; Dorrett Boomsma; Kari Branham; Gerome Breen; Chad M Brummett; Fabio Busonero; Harry Campbell; Andrew Chan; Sai Chen; Emily Chew; Francis S Collins; Laura J Corbin; George Davey Smith; George Dedoussis; Marcus Dorr; Aliki-Eleni Farmaki; Luigi Ferrucci; Lukas Forer; Ross M Fraser; Stacey Gabriel; Shawn Levy; Leif Groop; Tabitha Harrison; Andrew Hattersley; Oddgeir L Holmen; Kristian Hveem; Matthias Kretzler; James C Lee; Matt McGue; Thomas Meitinger; David Melzer; Josine L Min; Karen L Mohlke; John B Vincent; Matthias Nauck; Deborah Nickerson; Aarno Palotie; Michele Pato; Nicola Pirastu; Melvin McInnis; J Brent Richards; Cinzia Sala; Veikko Salomaa; David Schlessinger; Sebastian Schoenherr; P Eline Slagboom; Kerrin Small; Timothy Spector; Dwight Stambolian; Marcus Tuke; Jaakko Tuomilehto; Leonard H Van den Berg; Wouter Van Rheenen; Uwe Volker; Cisca Wijmenga; Daniela Toniolo; Eleftheria Zeggini; Paolo Gasparini; Matthew G Sampson; James F Wilson; Timothy Frayling; Paul I W de Bakker; Morris A Swertz; Steven McCarroll; Charles Kooperberg; Annelot Dekker; David Altshuler; Cristen Willer; William Iacono; Samuli Ripatti; Nicole Soranzo; Klaudia Walter; Anand Swaroop; Francesco Cucca; Carl A Anderson; Richard M Myers; Michael Boehnke; Mark I McCarthy; Richard Durbin
Journal: Nat Genet Date: 2016-08-22 Impact factor: 38.330

Review 9. Body Mass Index Is Associated with Inflammatory Bowel Disease: A Systematic Review and Meta-Analysis.

Authors: Jie Dong; Yi Chen; Yuchen Tang; Fei Xu; Chaohui Yu; Youming Li; Prasoon Pankaj; Ning Dai
Journal: PLoS One Date: 2015-12-14 Impact factor: 3.240

10. Defining the role of common variation in the genomic and biological architecture of adult human height.

Authors: Andrew R Wood; Tonu Esko; Jian Yang; Sailaja Vedantam; Tune H Pers; Stefan Gustafsson; Audrey Y Chu; Karol Estrada; Jian'an Luan; Zoltán Kutalik; Najaf Amin; Martin L Buchkovich; Damien C Croteau-Chonka; Felix R Day; Yanan Duan; Tove Fall; Rudolf Fehrmann; Teresa Ferreira; Anne U Jackson; Juha Karjalainen; Ken Sin Lo; Adam E Locke; Reedik Mägi; Evelin Mihailov; Eleonora Porcu; Joshua C Randall; André Scherag; Anna A E Vinkhuyzen; Harm-Jan Westra; Thomas W Winkler; Tsegaselassie Workalemahu; Jing Hua Zhao; Devin Absher; Eva Albrecht; Denise Anderson; Jeffrey Baron; Marian Beekman; Ayse Demirkan; Georg B Ehret; Bjarke Feenstra; Mary F Feitosa; Krista Fischer; Ross M Fraser; Anuj Goel; Jian Gong; Anne E Justice; Stavroula Kanoni; Marcus E Kleber; Kati Kristiansson; Unhee Lim; Vaneet Lotay; Julian C Lui; Massimo Mangino; Irene Mateo Leach; Carolina Medina-Gomez; Michael A Nalls; Dale R Nyholt; Cameron D Palmer; Dorota Pasko; Sonali Pechlivanis; Inga Prokopenko; Janina S Ried; Stephan Ripke; Dmitry Shungin; Alena Stancáková; Rona J Strawbridge; Yun Ju Sung; Toshiko Tanaka; Alexander Teumer; Stella Trompet; Sander W van der Laan; Jessica van Setten; Jana V Van Vliet-Ostaptchouk; Zhaoming Wang; Loïc Yengo; Weihua Zhang; Uzma Afzal; Johan Arnlöv; Gillian M Arscott; Stefania Bandinelli; Amy Barrett; Claire Bellis; Amanda J Bennett; Christian Berne; Matthias Blüher; Jennifer L Bolton; Yvonne Böttcher; Heather A Boyd; Marcel Bruinenberg; Brendan M Buckley; Steven Buyske; Ida H Caspersen; Peter S Chines; Robert Clarke; Simone Claudi-Boehm; Matthew Cooper; E Warwick Daw; Pim A De Jong; Joris Deelen; Graciela Delgado; Josh C Denny; Rosalie Dhonukshe-Rutten; Maria Dimitriou; Alex S F Doney; Marcus Dörr; Niina Eklund; Elodie Eury; Lasse Folkersen; Melissa E Garcia; Frank Geller; Vilmantas Giedraitis; Alan S Go; Harald Grallert; Tanja B Grammer; Jürgen Gräßler; Henrik Grönberg; Lisette C P G M de Groot; Christopher J Groves; Jeffrey Haessler; Per Hall; Toomas Haller; Goran Hallmans; Anke Hannemann; Catharina A Hartman; Maija Hassinen; Caroline Hayward; Nancy L Heard-Costa; Quinta Helmer; Gibran Hemani; Anjali K Henders; Hans L Hillege; Mark A Hlatky; Wolfgang Hoffmann; Per Hoffmann; Oddgeir Holmen; Jeanine J Houwing-Duistermaat; Thomas Illig; Aaron Isaacs; Alan L James; Janina Jeff; Berit Johansen; Åsa Johansson; Jennifer Jolley; Thorhildur Juliusdottir; Juhani Junttila; Abel N Kho; Leena Kinnunen; Norman Klopp; Thomas Kocher; Wolfgang Kratzer; Peter Lichtner; Lars Lind; Jaana Lindström; Stéphane Lobbens; Mattias Lorentzon; Yingchang Lu; Valeriya Lyssenko; Patrik K E Magnusson; Anubha Mahajan; Marc Maillard; Wendy L McArdle; Colin A McKenzie; Stela McLachlan; Paul J McLaren; Cristina Menni; Sigrun Merger; Lili Milani; Alireza Moayyeri; Keri L Monda; Mario A Morken; Gabriele Müller; Martina Müller-Nurasyid; Arthur W Musk; Narisu Narisu; Matthias Nauck; Ilja M Nolte; Markus M Nöthen; Laticia Oozageer; Stefan Pilz; Nigel W Rayner; Frida Renstrom; Neil R Robertson; Lynda M Rose; Ronan Roussel; Serena Sanna; Hubert Scharnagl; Salome Scholtens; Fredrick R Schumacher; Heribert Schunkert; Robert A Scott; Joban Sehmi; Thomas Seufferlein; Jianxin Shi; Karri Silventoinen; Johannes H Smit; Albert Vernon Smith; Joanna Smolonska; Alice V Stanton; Kathleen Stirrups; David J Stott; Heather M Stringham; Johan Sundström; Morris A Swertz; Ann-Christine Syvänen; Bamidele O Tayo; Gudmar Thorleifsson; Jonathan P Tyrer; Suzanne van Dijk; Natasja M van Schoor; Nathalie van der Velde; Diana van Heemst; Floor V A van Oort; Sita H Vermeulen; Niek Verweij; Judith M Vonk; Lindsay L Waite; Melanie Waldenberger; Roman Wennauer; Lynne R Wilkens; Christina Willenborg; Tom Wilsgaard; Mary K Wojczynski; Andrew Wong; Alan F Wright; Qunyuan Zhang; Dominique Arveiler; Stephan J L Bakker; John Beilby; Richard N Bergman; Sven Bergmann; Reiner Biffar; John Blangero; Dorret I Boomsma; Stefan R Bornstein; Pascal Bovet; Paolo Brambilla; Morris J Brown; Harry Campbell; Mark J Caulfield; Aravinda Chakravarti; Rory Collins; Francis S Collins; Dana C Crawford; L Adrienne Cupples; John Danesh; Ulf de Faire; Hester M den Ruijter; Raimund Erbel; Jeanette Erdmann; Johan G Eriksson; Martin Farrall; Ele Ferrannini; Jean Ferrières; Ian Ford; Nita G Forouhi; Terrence Forrester; Ron T Gansevoort; Pablo V Gejman; Christian Gieger; Alain Golay; Omri Gottesman; Vilmundur Gudnason; Ulf Gyllensten; David W Haas; Alistair S Hall; Tamara B Harris; Andrew T Hattersley; Andrew C Heath; Christian Hengstenberg; Andrew A Hicks; Lucia A Hindorff; Aroon D Hingorani; Albert Hofman; G Kees Hovingh; Steve E Humphries; Steven C Hunt; Elina Hypponen; Kevin B Jacobs; Marjo-Riitta Jarvelin; Pekka Jousilahti; Antti M Jula; Jaakko Kaprio; John J P Kastelein; Manfred Kayser; Frank Kee; Sirkka M Keinanen-Kiukaanniemi; Lambertus A Kiemeney; Jaspal S Kooner; Charles Kooperberg; Seppo Koskinen; Peter Kovacs; Aldi T Kraja; Meena Kumari; Johanna Kuusisto; Timo A Lakka; Claudia Langenberg; Loic Le Marchand; Terho Lehtimäki; Sara Lupoli; Pamela A F Madden; Satu Männistö; Paolo Manunta; André Marette; Tara C Matise; Barbara McKnight; Thomas Meitinger; Frans L Moll; Grant W Montgomery; Andrew D Morris; Andrew P Morris; Jeffrey C Murray; Mari Nelis; Claes Ohlsson; Albertine J Oldehinkel; Ken K Ong; Willem H Ouwehand; Gerard Pasterkamp; Annette Peters; Peter P Pramstaller; Jackie F Price; Lu Qi; Olli T Raitakari; Tuomo Rankinen; D C Rao; Treva K Rice; Marylyn Ritchie; Igor Rudan; Veikko Salomaa; Nilesh J Samani; Jouko Saramies; Mark A Sarzynski; Peter E H Schwarz; Sylvain Sebert; Peter Sever; Alan R Shuldiner; Juha Sinisalo; Valgerdur Steinthorsdottir; Ronald P Stolk; Jean-Claude Tardif; Anke Tönjes; Angelo Tremblay; Elena Tremoli; Jarmo Virtamo; Marie-Claude Vohl; Philippe Amouyel; Folkert W Asselbergs; Themistocles L Assimes; Murielle Bochud; Bernhard O Boehm; Eric Boerwinkle; Erwin P Bottinger; Claude Bouchard; Stéphane Cauchi; John C Chambers; Stephen J Chanock; Richard S Cooper; Paul I W de Bakker; George Dedoussis; Luigi Ferrucci; Paul W Franks; Philippe Froguel; Leif C Groop; Christopher A Haiman; Anders Hamsten; M Geoffrey Hayes; Jennie Hui; David J Hunter; Kristian Hveem; J Wouter Jukema; Robert C Kaplan; Mika Kivimaki; Diana Kuh; Markku Laakso; Yongmei Liu; Nicholas G Martin; Winfried März; Mads Melbye; Susanne Moebus; Patricia B Munroe; Inger Njølstad; Ben A Oostra; Colin N A Palmer; Nancy L Pedersen; Markus Perola; Louis Pérusse; Ulrike Peters; Joseph E Powell; Chris Power; Thomas Quertermous; Rainer Rauramaa; Eva Reinmaa; Paul M Ridker; Fernando Rivadeneira; Jerome I Rotter; Timo E Saaristo; Danish Saleheen; David Schlessinger; P Eline Slagboom; Harold Snieder; Tim D Spector; Konstantin Strauch; Michael Stumvoll; Jaakko Tuomilehto; Matti Uusitupa; Pim van der Harst; Henry Völzke; Mark Walker; Nicholas J Wareham; Hugh Watkins; H-Erich Wichmann; James F Wilson; Pieter Zanen; Panos Deloukas; Iris M Heid; Cecilia M Lindgren; Karen L Mohlke; Elizabeth K Speliotes; Unnur Thorsteinsdottir; Inês Barroso; Caroline S Fox; Kari E North; David P Strachan; Jacques S Beckmann; Sonja I Berndt; Michael Boehnke; Ingrid B Borecki; Mark I McCarthy; Andres Metspalu; Kari Stefansson; André G Uitterlinden; Cornelia M van Duijn; Lude Franke; Cristen J Willer; Alkes L Price; Guillaume Lettre; Ruth J F Loos; Michael N Weedon; Erik Ingelsson; Jeffrey R O'Connell; Goncalo R Abecasis; Daniel I Chasman; Michael E Goddard; Peter M Visscher; Joel N Hirschhorn; Timothy M Frayling
Journal: Nat Genet Date: 2014-10-05 Impact factor: 38.330

47 in total

1. Screening Human Embryos for Polygenic Traits Has Limited Utility.

Authors: Ehud Karavani; Or Zuk; Danny Zeevi; Nir Barzilai; Nikos C Stefanis; Alex Hatzimanolis; Nikolaos Smyrnis; Dimitrios Avramopoulos; Leonid Kruglyak; Gil Atzmon; Max Lam; Todd Lencz; Shai Carmi
Journal: Cell Date: 2019-11-21 Impact factor: 41.582

Review 2. African genetic diversity and adaptation inform a precision medicine agenda.

Authors: Luisa Pereira; Leon Mutesa; Paulina Tindana; Michèle Ramsay
Journal: Nat Rev Genet Date: 2021-01-11 Impact factor: 53.242

3. Multi-Polygenic Score Approach to Identifying Individual Vulnerabilities Associated With the Risk of Exposure to Bullying.

Authors: Tabea Schoeler; Shing Wan Choi; Frank Dudbridge; Jessie Baldwin; Lauren Duncan; Charlotte M Cecil; Esther Walton; Essi Viding; Eamon McCrory; Jean-Baptiste Pingault
Journal: JAMA Psychiatry Date: 2019-07-01 Impact factor: 21.596

4. Evaluating marginal genetic correlation of associated loci for complex diseases and traits between European and East Asian populations.

Authors: Haojie Lu; Ting Wang; Jinhui Zhang; Shuo Zhang; Shuiping Huang; Ping Zeng
Journal: Hum Genet Date: 2021-06-06 Impact factor: 4.132

5. Making the Most of Clumping and Thresholding for Polygenic Scores.

Authors: Florian Privé; Bjarni J Vilhjálmsson; Hugues Aschard; Michael G B Blum
Journal: Am J Hum Genet Date: 2019-11-21 Impact factor: 11.025

6. Genetic risk for coronary heart disease alters the influence of Alzheimer's genetic risk on mild cognitive impairment.

Authors: Jeremy A Elman; Matthew S Panizzon; Mark W Logue; Nathan A Gillespie; Michael C Neale; Chandra A Reynolds; Daniel E Gustavson; Brinda K Rana; Ole A Andreassen; Anders M Dale; Carol E Franz; Michael J Lyons; William S Kremen
Journal: Neurobiol Aging Date: 2019-06-08 Impact factor: 4.673

7. PRSice-2: Polygenic Risk Score software for biobank-scale data.

Authors: Shing Wan Choi; Paul F O'Reilly
Journal: Gigascience Date: 2019-07-01 Impact factor: 6.524

8. Twins Early Development Study: A Genetically Sensitive Investigation into Behavioral and Cognitive Development from Infancy to Emerging Adulthood.

Authors: Kaili Rimfeld; Margherita Malanchini; Thomas Spargo; Gemma Spickernell; Saskia Selzam; Andrew McMillan; Philip S Dale; Thalia C Eley; Robert Plomin
Journal: Twin Res Hum Genet Date: 2019-09-23 Impact factor: 1.587

9. Combined Utility of 25 Disease and Risk Factor Polygenic Risk Scores for Stratifying Risk of All-Cause Mortality.

Authors: Allison Meisner; Prosenjit Kundu; Yan Dora Zhang; Lauren V Lan; Sungwon Kim; Disha Ghandwani; Parichoy Pal Choudhury; Sonja I Berndt; Neal D Freedman; Montserrat Garcia-Closas; Nilanjan Chatterjee
Journal: Am J Hum Genet Date: 2020-08-05 Impact factor: 11.025

Review 10. Polygenic risk scoring and prediction of mental health outcomes.

Authors: John S Anderson; Jess Shade; Emily DiBlasi; Andrey A Shabalin; Anna R Docherty
Journal: Curr Opin Psychol Date: 2018-09-20