Literature DB >> 23977056

Novel genetic analysis for case-control genome-wide association studies: quantification of power and genomic prediction accuracy.

Abstract

Genome-wide association studies (GWAS) are routinely conducted for both quantitative and binary (disease) traits. We present two analytical tools for use in the experimental design of GWAS. Firstly, we present power calculations quantifying power in a unified framework for a range of scenarios. In this context we consider the utility of quantitative scores (e.g. endophenotypes) that may be available on cases only or both cases and controls. Secondly, we consider, the accuracy of prediction of genetic risk from genome-wide SNPs and derive an expression for genomic prediction accuracy using a liability threshold model for disease traits in a case-control design. The expected values based on our derived equations for both power and prediction accuracy agree well with observed estimates from simulations.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2013 PMID： 23977056 PMCID： PMC3747270 DOI： 10.1371/journal.pone.0071494

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

In the last five years, GWAS have been published for both quantitative traits (such as height [1], or blood markers [2]) and disease [3]. In order to assess the relative potential for success of these studies Yang et al. [4] provided an analytical method for comparison of power. For example, this method has been used to quantify that a sample of ∼50,000 schizophrenia cases and 50,000 controls is needed to afford the same power as the largest published GWAS of height (a total sample size of 180,000) [5]. Use of quantitative endophenotypes rather than binary traits has been proposed as a strategy to increase power in neuropsychiatric disorders [6]. Endophenotypes are measurable quantitative scores that are assumed to be associated with a continuous liability that underlies observed disease status, in which case the quantitative score may be more informative and powerful compared to binary responses. Of course, the true underlying liability would be the most informative although it is not observable. Recently, van der Sluis et al. [7] suggested a better use of phenotypic information in GWAS of psychiatric disorders measured in population cohorts. Rather than using binary responses of affected/non-affected they considered the use of continuous scores from diagnostic instruments. They showed that binary responses based on clinical cut-off criteria decreased power dramatically compared to the use of sum scores of item responses from the diagnostic instrument. The authors recommended that continuous quantitative responses such as sum scores of item responses should be used in psychiatry disorder GWAS, where possible. The study by van der Sluis et al. [7] compared scenarios by simulation and was based on population samples. Here, we provide an analytical method to calculate power in different scenarios with both population and case control samples. Another potential use of data collected in GWAS is the prediction of genetic risk. Genomic-enabled prediction is a potentially powerful tool to identify individuals at higher risk of disease [3], [8]. Undoubtedly, prediction accuracy plays a crucial role in a successful clinical application for genetic risk prediction of disease, and several studies have evaluated the predictive ability [9], [10], [11]. Daetwyler et al. [12] derived a theoretical accuracy for predicting genetic risk from genome-wide SNPs, based on least squares methodology. Many studies have used their formula, which works well for quantitative traits. However, in simulation studies their formula for case-control traits underpredicted the true accuracy (Table 4 of Daetwyler et al. [12]). In this study, we address two issues relevant for the design case-control GWAS, power and genomic prediction accuracy. First, we derive analytically, in a unified framework, the power of GWAS when using population or ascertained case-control samples with binary as well as quantitative responses. Secondly, we derive genomic prediction accuracy based on the 0,1 observed scale, and transform it to the liability scale using a liability threshold model for disease traits in population [13] and in case-control samples [14]. The expected values based on our derived equations and the average of observed estimates from simulation agree well.

Materials and Methods

Power

Given a specified critical value for significance, power of a given association study design can be derived from the non-centrality parameter (NCP, λ) of a χ 2 test of association. Following methods of Yang et al. [4] we derive the NCP for five different experimental designs:, i.e. quantitative responses in population (QT_POP) (1), binary responses in population (BT_POP) (2), binary responses in ascertained case-control samples (BT_CC) (3), quantitative responses in ascertained case-control samples (QT_CC) (4) and samples of both ascertained cases and controls in which quantitative responses are available in the cases only (QB_CC) (5). The derived NCP for BT_CC, QT_CC and QB_CC are novel. Following Yang et al. [4], 1) NCP for quantitative responses in population samples, where N is the total number of sample, is the proportion of variance explained by a single genetic marker or set of markers, i.e. multi locus association tests [15], [16]. 2) NCP for binary traits in population samples where is the proportion of variance explained by a genetic marker or set of markers on the observed scale, and [13], where z is the height of the normal curve truncating the proportion K, where K is the proportion of the population that are cases. 3) NCP for binary responses in ascertained case-control samples, where , [14], [17], with P the proportion of cases in the case control sample and the genetic variance in the case-control sample inflated relative to the population sample as a result of the ascertainment process [12], such cases are over-represented compared to the population sample. When and is small, (3) can be approximated and simplified as , which agrees with the derivation based on the relative risk and multiplicative model by Yang et al. [4]. 4) NCP for quantitative responses in ascertained case-control samples, where , where is the variance of disease liability [18]. This equality is derived from quantitative genetic theory [18] in the following way. Firstly,where i is the mean liability in cases and t is the threshold on the normal distribution which truncates the proportion of disease prevalence K, and from Daetwyler et al. [12] In a similar manner, the inflated variance due to non-genetic effects is, The covariance between disease liability and genetic values in an ascertained case-control sample iswhere . Therefore, from (5), (6) and (7), The regression coefficient of l on g is Finally, the proportion of variance attributable to the SNPs or set of SNPs of a quantitative response in an ascertained case-control sample can be obtained as the squared regression coefficient multiplied by the genetic variance in the case-control sample and scaled by the variance of disease liability in the case control sample, i.e. 5) NCP for quantitative responses for cases only in ascertained case-control samples, When underlying continuous quantitative responses are available only for cases in the ascertained case-control sample, i.e. the recorded values follow a mixture distribution of zero for controls and truncated normal distribution for cases. An example may be a GWAS of major depressive disorder in which cases are recorded for a quantitative severity score, whereas controls have not been scored. In this situations,where , which is explained as follows. The variance of the mixed zero and truncated normal values in an ascertained case-control sample is,where i and t are the same as defined above. There is an assumption here is that the quantitative trait is the phenotypic liability. The covariance between and g in an ascertained case-control sample is,where , and, From the equations above, regression coefficient of on gcc can be derived analytically as, Therefore, the proportion variance attributable to the variance in the SNPs from mixed zero and quantitative response in an ascertained case-control sample () can be expressed as above under (8). The power for this mixed 0 and truncated normal responses is very similar to that for BT_CC (results not shown).

Genomic prediction accuracy

Normal quantitative traits

For a quantitative trait, β is the random allelic substitution effect of the jth single nucleotide polymorphism (SNP). Following Daetwyler et al. [12], prediction error variance for the jth SNP effect iswhere is the estimate of the true regression β of the phenotype on the jth SNP genotype, x = 0, 1 or 2 for the ith individual, N is the number of individual records and σ is the residual variance. Assuming a phenotypic variance of one, the genetic variance (var(g)) explained by the set of M SNPs is h. Following Daetwyler et al. [12], the estimated genetic variance explained by the M SNPs in the predictor (var()) is a function of the h, M, the number of records (N) and the residual variance (σ) as The squared correlation coefficient between the true and estimated genetic value is the ratio of the true genetic variance over the estimated genetic variance [12] aswhere the residual variance would be approximated as σ = 1 (phenotypic variance) as in Daetwyler et al. [12]. With τ defined as the ratio of the number of samples (N) over the number of SNPs (M), the accuracy can also be written as [12]

Disease traits in population sample

In binary disease traits, with σ approximated as σ = K(1–K) (i.e. binomial phenotypic variance for a disease with population prevalence of K), the prediction error variance for the jth SNP effect can be written aswhere β is allele substitution effect on the 0, 1 observed scale and is the estimated β from regression of the 0,1 discrete phenotypes on SNP coefficients. The estimated genetic variance on the observed scale of the SNP predictor (var()) is a function of the genetic variance on the observed scale (var(u) or ), the number of SNPs, the number of records and the residual variance as The squared correlation coefficient between the true and estimated genetic values is Because genetic variance as a proportion of phenotypic variance on the observed scale can be transformed from that on the liability scale as [13], prediction accuracy can be re-expressed as Equation (10) here is the same as equation (6) in Daetwyler et al. [12].

Disease traits in ascertained case-control study

Ascertainment in case-control samples often results in over-representation of cases compared to the case prevalence in the population. The variance of the explanatory variable is inflated by a factor of [12], [17]. The term, , is the inflated genetic variance due to ascertainment in case-control sample [12]. Therefore, the inflated explanatory variable for the jth SNP can be written as . Then, the prediction error variance for the jth SNP effect can be expressed aswhere β is allele substitution effect on the 0, 1 observed scale and is estimated β from regression of the 0,1 discrete phenotypes on SNP coefficients in the case-control sample. The estimated genetic variance on the observed scale in a case-control design can be derived as where var(x) is the genetic variance on the observed scale due to the jth SNP effect transformed to the liability scale [14], [17] . With a sufficient number of causal SNPs (>∼20), the residual variance is approximated as σ = P(1–P) (i.e. the binomial phenotypic variance in a case-control sample where the proportion of cases is P), and the value for f is close to 1 (i.e. a small fraction of genetic variance has a negligible inflation). Therefore, the genetic variance in a case-control sample is and, the estimated genetic variance in a case-control sample is approximately The squared correlation coefficient between the true and estimated genetic values is Equation (11) differs from equation (9) of Daetwyler et al. [12], i.e. For binary traits, area under the receiver-operator characteristic curve (AUC) is a useful statistic for the genomic prediction accuracy [19], [20]. A relationship between the correlation coefficient and AUC has been shown in previous studies [11], [20].

Simulation Study

In order to check the analytically derived equations of NCP for BT_POP, BT_CC, QT_POP and QT_CC, we carried out a simulation study. Individual genetic values (g) were simulated from an additive multilocus model of M = 100 independent SNPs with equal allele effects and allele frequency of 0.5. Residual values (e) were independently generated from a random normal distribution with a mean of zero and variance of . The value of was set relative to so that the desired proportion of variance explained by the markers, h was obtained. We simulated h of 0.01, 0.05 and 0.1 so that each SNP explained 0.0001, 0.0005 and 0.001 of the phenotypic variance (Table 1). Liability phenotypes for each individual were simulated as y = g + e. Affected individuals were those with liability phenotype that exceeded a threshold determined by population prevalence. The numbers of cases and controls in the sample was 2000. The values for population prevalence were varied as K = 0.1, 0.01 or 0.001. The proportion of cases was P = K in simulations of population sample and P = 0.5 in simulations of case-control sample where cases were over-sampled by a factor (1–K)/K. In population or case-control sample, we used both binary (BT_POP or BT_CC) and quantitative responses (QT_POP or QT_CC). We conducted 100 replicates for each simulation scenario, therefore 10000 association tests were carried out. Power was calculated as the proportion of the 10000 association tests in which the association p-value less than 0.05 and was compared to power calculated from the NCP using a function in R package, i.e. power = 1– pchisq (T, 1, ncp = NCP) where T is the normal distribution threshold corresponding to the significance level 0.05.

Table 1

Expected power for an association study from the derived equations and observed averaged power from simulation.

	BT_POP		BT_CC^a		QT_POP		QT_CC^a
h²	Exp	Obs (SE)	Exp	Obs (SE)	Exp	Obs (SE)	Exp	Obs (SE)
N = 2000, K = 0.1
0.0001	0.058	0.053 (0.002)	0.072	0.072 (0.003)	0.073	0.075 (0.003)	0.082	0.083 (0.003)
0.0005	0.090	0.086 (0.003)	0.164	0.163 (0.004)	0.170	0.172 (0.004)	0.218	0.221 (0.004)
0.001	0.131	0.130 (0.003)	0.281	0.286(0.005)	0.293	0.294 (0.005)	0.386	0.386 (0.005)
N = 2000, K = 0.01
0.0001	0.052	0.057 (0.002)	0.092	0.092 (0.003)	0.073	0.075 (0.003)	0.105	0.102 (0.003)
0.0005	0.058	0.057 (0.002)	0.270	0.267(0.004)	0.170	0.169 (0.004)	0.333	0.329 (0.005)
0.001	0.067	0.066 (0.002)	0.478	0.474 (0.005)	0.293	0.295 (0.005)	0.579	0.574 (0.005)
N = 2000, K = 0.001
0.0001	0.050	0.042 (0.002)	0.117	0.117(0.003)	0.073	0.075 (0.003)	0.130	0.132 (0.003)
0.0005	0.051	0.052 (0.002)	0.392	0.387 (0.005)	0.170	0.176 (0.004)	0.451	0.451 (0.005)
0.001	0.053	0.052 (0.002)	0.664	0.657 (0.005)	0.293	0.296 (0.005)	0.738	0.733 (0.004)

h 2: variance explained by the locus.

a: in case-control samples, 50% of the sample are cases, P = 0.5.

Exp: Expected power based on NCP derived from equation (1)∼(4).

Obs: Averaged power over 10000 replicates of simulation.

SE: Empirical standard error over 10000 replicates.

h 2: variance explained by the locus. a: in case-control samples, 50% of the sample are cases, P = 0.5. Exp: Expected power based on NCP derived from equation (1)∼(4). Obs: Averaged power over 10000 replicates of simulation. SE: Empirical standard error over 10000 replicates. Simulations were carried out to verify the validity of equations (10) and (11). In a simulation study, individual genetic values (g) were simulated from an additive multilocus model with equal allele effects (allele frequency of ∼0.5) and residual values (e) independently generated from a random normal distribution with a mean of zero and variance of . The value of was set relative to so that the desired proportion of variance explained by the markers, h was obtained. Liability phenotypes for each individual were simulated as y = g + e. Affected individuals were those with liability phenotype that exceeded a threshold determined by population prevalence. Population prevalences of K = 0.001, 0.01, 0.1, 0.2 and 0.5 were used with N = 2000 and M = 2000. To vary τ = N/M, N = 2000 and M = 400 were used for τ = 5, and N = 100 and M = 5000 were used for τ = 0.02. Following Daetwyler et al. [12], allele substitution effects () were estimated using a regression analysis for each simulated SNP. As a validation set, a second sample of individuals was generated based on the same genetic parameters as in the original population. Empirical prediction accuracy can be obtained by correlating the true genetic values (g) and estimated genetic values in the validation set.

Results

The power of association tests observed in simulation and expected from theory agreed well under a range of scenarios (Table 1). Whether using lower or higher values of disease prevalence K, there was an excellent agreement between the observed and expected power with a small empirical standard error. When using a higher variance explained by each locus (h), although the empirical standard error increased slightly, the observed value also agreed well with the expected value (Exp and Obs in Table 1). In Figure 1, values for the power based on NCP derived from equations (1)∼(4) were plotted against variance explained by SNPs (i.e., h). Generally, the power increases when the variance explained by SNPs increases, and when the ascertained case-control design is used. For BT_POP, the power decreases as K decreases, reflecting the smaller number of cases in a given population sample. For QT_POP, the power is, of course, constant across a–c in Figure 1. When using an ascertained sample (BT_CC or QT_CC), the power increases as the value for K decreases, which reflects the greater over-sampling of cases with lower K for the same sample size and hence the difference in mean liability between cases and controls increases. There is a moderate difference between BT_CC and QT_CC when using population prevalence K = 0.1 (a in Figure 1). The difference between BT_CC and QT_CC becomes smaller with lower values for K (b and c in Figure 1).

Figure 1

Power derived for QT_POP (dotted line), BT_POP (solid line), BT_CC (dashed line) and QT_CC (dot-dashed line) when using population prevalence K = 0.1 (a), K = 0.01 (b) or K = 0.001 (c) assuming the same total sample size N = 2000 and a critical significance threshold of 5×10−8.

The expected accuracies predicted from equation (11) agreed well with the observed average of estimates from simulation for all simulation scenarios for both population and ascertained case-control samples. In Table 2, disease prevalence K varies, in Table 3 proportion of variance explained by SNPs h varies and in Table 4, values for τ = N/M vary. For comparison, we list also the predicted accuracies for case-control samples provided in Daetwyler et al. [12]. As shown in their Table 4, their formula underestimates prediction accuracy particularly when disease prevalences are low (Table 2) and h are high (Table 3). We also tested the prediction accuracy with allele effects sampled from a normal or an exponential distribution. The results from these alternative distributions of allele effects were not much different from the main results (results not shown). This agrees with Daetwyler et al. [12] in that the derived prediction accuracy is robust to distributional assumption for allele effects.

Table 2

Prediction accuracy for a disease with population or case-control samples when true proportion of variance explained by the set of SNPs on the liability scale is 0.5, τ = N/M is 1 for different disease prevalences.

Prevalence	Population		Case-Control
	Exp1	Est (se)	Exp2	Exp3	Est (se)
0.001	0.075	0.063 (0.004)	0.628	0.766	0.767 (0.002)
0.01	0.186	0.183 (0.003)	0.594	0.689	0.690 (0.002)
0.1	0.382	0.377 (0.003)	0.533	0.568	0.570 (0.002)
0.2	0.444	0.438 (0.003)	0.511	0.526	0.529 (0.003)
0.5	0.491	0.487 (0.003)	0.491	0.491	0.487 (0.003)

Exp1: Expected value from equation (2) or equation (6) of Daetwyler et al. (2008).

Exp2: Expected value from equation (9) of Daetwyler et al (2008).

Exp3: Expected value from equation (3).

Est: Average of estimates from 100 replicates.

se: Empirical standard error over 100 replicates.

Proportion of cases in case-control study is P = 0.5.

Table 3

Prediction accuracy for a disease with population or case-control samples when prevalence is 0.01, τ = N/M is 1 for diseases with different h.

h²	Population		Case-Control
	Exp1	Est (se)	Exp2	Exp3	Est (se)
0.1	0.084	0.087 (0.004)	0.371	0.392	0.395 (0.003)
0.5	0.186	0.183 (0.003)	0.594	0.689	0.690 (0.002)
0.9	0.246	0.243 (0.003)	0.653	0.787	0.787 (0.001)

Exp1: Expected value from equation (2) or equation (6) of Daetwyler et al. (2008).

Exp2: Expected value from equation (9) of Daetwyler et al (2008).

Exp3: Expected value from equation (3).

Est: Average of estimates from 100 replicates.

se: Empirical standard error over 100 replicates.

Proportion of cases in case-control study is P = 0.5.

Table 4

Prediction accuracy for a disease with population or case-control samples when true proportion of variance explained by the set of SNPs on the liability scale is 0.5, prevalence is 0.01 and τ = N/M varies.

τ = N/M	Population		Case-Control
	Exp1	Est (se)	Exp2	Exp3	Est (se)
0.02	0.027	0.028 (0.003)	0.104	0.133	0.124 (0.004)
1	0.186	0.183 (0.003)	0.594	0.689	0.690 (0.002)
5	0.390	0.389 (0.004)	0.731	0.905	0.905 (0.001)

Exp1: Expected value from equation (2) or equation (6) of Daetwyler et al. (2008).

Exp2: Expected value from equation (9) of Daetwyler et al (2008).

Exp3: Expected value from equation (3).

Est: Average of estimates from 100 replicates.

se: Empirical standard error over 100 replicates.

Proportion of cases in case-control study is P = 0.5.

Exp1: Expected value from equation (2) or equation (6) of Daetwyler et al. (2008). Exp2: Expected value from equation (9) of Daetwyler et al (2008). Exp3: Expected value from equation (3). Est: Average of estimates from 100 replicates. se: Empirical standard error over 100 replicates. Proportion of cases in case-control study is P = 0.5. Exp1: Expected value from equation (2) or equation (6) of Daetwyler et al. (2008). Exp2: Expected value from equation (9) of Daetwyler et al (2008). Exp3: Expected value from equation (3). Est: Average of estimates from 100 replicates. se: Empirical standard error over 100 replicates. Proportion of cases in case-control study is P = 0.5. Exp1: Expected value from equation (2) or equation (6) of Daetwyler et al. (2008). Exp2: Expected value from equation (9) of Daetwyler et al (2008). Exp3: Expected value from equation (3). Est: Average of estimates from 100 replicates. se: Empirical standard error over 100 replicates. Proportion of cases in case-control study is P = 0.5.

Discussion

Firstly, we provide analytical derivations in a unified framework to quantify the power of GWAS when using population or ascertained case-control samples with binary responses or quantitative responses. The derived equations were validated in a simulation study, showing that expected values from the equations and observed values from simulations agreed well. Secondly, following Daetwyler et al. [12], we derive an expression genomic prediction accuracy based on the 0,1 observed scale, and transformed it to that on the liability scale using a liability threshold model for disease traits in population [13] and in case-control samples [14]. Compared with Daetwyler et al. [12], our derivation agrees for population samples, but is more accurate for case-control samples. The Genetic Power Calculator [21] is commonly used for calculation of power is genetic association studies. The calculator is based on theoretical derivation [22], [23] of a single locus model with required parameters of allele frequency and its effect size (e.g. relative risk or odds ratio in binary responses). However, our derivations and application did not require those parameters (see equation (3), (4) and (8) and Appendix S1 and S2 for application) because our derivations are based on variance explained by a locus, and many combinations of allele frequency and effect size can generate the same variance explained. Our framework easily accommodates power of association of multiple loci because we use a single parameter for the total variance that is generated by any number of loci. Applications of multiple loci association GWAS have been published recently [24], [25]. In practice, the power to detect causal variants may not exactly agree with our analytical derivations because of unknown parameters such as linkage disequilibrium among variants and distribution of effect size that alter the effective number of tests. We recommend that such unknown parameters should be carefully considered in applying power calculation. Recently, Dudbridge [11] proposed a comprehensive study about power and predictive accuracy of polygenic scores. Our equation (11) and Dudbridge's equation (13) [11] are analogous to each other. However, Dudbridge used his equation (13) with a heuristic justification from simulations. We analytically derived equation (11) based on a liability threshold model and gave a reasonable explanation why f is approximated as 1. Lastly, van der Sluis et al. [7] quantified by simulation the power lost in genetic association analyses of population samples measured for quantitative endophenotypes but analysed with a dichotomous case-control score. Our analytical derivations for such scenarios allow easy generalization of their results to the design of new studies. R code for the power derivations described in the paper. (DOC) Click here for additional data file. R code for the prediction accuracy derivations described in the paper. (DOC) Click here for additional data file.

23 in total

1. Genetic Power Calculator: design of linkage and association genetic mapping studies of complex traits.

Authors: S Purcell; S S Cherny; P C Sham
Journal: Bioinformatics Date: 2003-01 Impact factor: 6.937

2. A better coefficient of determination for genetic profile analysis.

Authors: Sang Hong Lee; Michael E Goddard; Naomi R Wray; Peter M Visscher
Journal: Genet Epidemiol Date: 2012-04 Impact factor: 2.135

3. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits.

Authors: Lucia A Hindorff; Praveen Sethupathy; Heather A Junkins; Erin M Ramos; Jayashri P Mehta; Francis S Collins; Teri A Manolio
Journal: Proc Natl Acad Sci U S A Date: 2009-05-27 Impact factor: 11.205

4. A novel variational Bayes multiple locus Z-statistic for genome-wide association studies with Bayesian model averaging.

Authors: Benjamin A Logsdon; Cara L Carty; Alexander P Reiner; James Y Dai; Charles Kooperberg
Journal: Bioinformatics Date: 2012-05-04 Impact factor: 6.937

5. Power in GWAS: lifting the curse of the clinical cut-off.

Authors: S van der Sluis; D Posthuma; M G Nivard; M Verhage; C V Dolan
Journal: Mol Psychiatry Date: 2012-05-22 Impact factor: 15.992

Review 6. The endophenotype concept in psychiatry: etymology and strategic intentions.

Authors: Irving I Gottesman; Todd D Gould
Journal: Am J Psychiatry Date: 2003-04 Impact factor: 18.112

7. Multi-locus test conditional on confirmed effects leads to increased power in genome-wide association studies.

Authors: Li Ma; Shizhong Han; Jing Yang; Yang Da
Journal: PLoS One Date: 2010-11-16 Impact factor: 3.240

8. Hundreds of variants clustered in genomic loci and biological pathways affect human height.

Authors: Hana Lango Allen; Karol Estrada; Guillaume Lettre; Sonja I Berndt; Michael N Weedon; Fernando Rivadeneira; Cristen J Willer; Anne U Jackson; Sailaja Vedantam; Soumya Raychaudhuri; Teresa Ferreira; Andrew R Wood; Robert J Weyant; Ayellet V Segrè; Elizabeth K Speliotes; Eleanor Wheeler; Nicole Soranzo; Ju-Hyun Park; Jian Yang; Daniel Gudbjartsson; Nancy L Heard-Costa; Joshua C Randall; Lu Qi; Albert Vernon Smith; Reedik Mägi; Tomi Pastinen; Liming Liang; Iris M Heid; Jian'an Luan; Gudmar Thorleifsson; Thomas W Winkler; Michael E Goddard; Ken Sin Lo; Cameron Palmer; Tsegaselassie Workalemahu; Yurii S Aulchenko; Asa Johansson; M Carola Zillikens; Mary F Feitosa; Tõnu Esko; Toby Johnson; Shamika Ketkar; Peter Kraft; Massimo Mangino; Inga Prokopenko; Devin Absher; Eva Albrecht; Florian Ernst; Nicole L Glazer; Caroline Hayward; Jouke-Jan Hottenga; Kevin B Jacobs; Joshua W Knowles; Zoltán Kutalik; Keri L Monda; Ozren Polasek; Michael Preuss; Nigel W Rayner; Neil R Robertson; Valgerdur Steinthorsdottir; Jonathan P Tyrer; Benjamin F Voight; Fredrik Wiklund; Jianfeng Xu; Jing Hua Zhao; Dale R Nyholt; Niina Pellikka; Markus Perola; John R B Perry; Ida Surakka; Mari-Liis Tammesoo; Elizabeth L Altmaier; Najaf Amin; Thor Aspelund; Tushar Bhangale; Gabrielle Boucher; Daniel I Chasman; Constance Chen; Lachlan Coin; Matthew N Cooper; Anna L Dixon; Quince Gibson; Elin Grundberg; Ke Hao; M Juhani Junttila; Lee M Kaplan; Johannes Kettunen; Inke R König; Tony Kwan; Robert W Lawrence; Douglas F Levinson; Mattias Lorentzon; Barbara McKnight; Andrew P Morris; Martina Müller; Julius Suh Ngwa; Shaun Purcell; Suzanne Rafelt; Rany M Salem; Erika Salvi; Serena Sanna; Jianxin Shi; Ulla Sovio; John R Thompson; Michael C Turchin; Liesbeth Vandenput; Dominique J Verlaan; Veronique Vitart; Charles C White; Andreas Ziegler; Peter Almgren; Anthony J Balmforth; Harry Campbell; Lorena Citterio; Alessandro De Grandi; Anna Dominiczak; Jubao Duan; Paul Elliott; Roberto Elosua; Johan G Eriksson; Nelson B Freimer; Eco J C Geus; Nicola Glorioso; Shen Haiqing; Anna-Liisa Hartikainen; Aki S Havulinna; Andrew A Hicks; Jennie Hui; Wilmar Igl; Thomas Illig; Antti Jula; Eero Kajantie; Tuomas O Kilpeläinen; Markku Koiranen; Ivana Kolcic; Seppo Koskinen; Peter Kovacs; Jaana Laitinen; Jianjun Liu; Marja-Liisa Lokki; Ana Marusic; Andrea Maschio; Thomas Meitinger; Antonella Mulas; Guillaume Paré; Alex N Parker; John F Peden; Astrid Petersmann; Irene Pichler; Kirsi H Pietiläinen; Anneli Pouta; Martin Ridderstråle; Jerome I Rotter; Jennifer G Sambrook; Alan R Sanders; Carsten Oliver Schmidt; Juha Sinisalo; Jan H Smit; Heather M Stringham; G Bragi Walters; Elisabeth Widen; Sarah H Wild; Gonneke Willemsen; Laura Zagato; Lina Zgaga; Paavo Zitting; Helene Alavere; Martin Farrall; Wendy L McArdle; Mari Nelis; Marjolein J Peters; Samuli Ripatti; Joyce B J van Meurs; Katja K Aben; Kristin G Ardlie; Jacques S Beckmann; John P Beilby; Richard N Bergman; Sven Bergmann; Francis S Collins; Daniele Cusi; Martin den Heijer; Gudny Eiriksdottir; Pablo V Gejman; Alistair S Hall; Anders Hamsten; Heikki V Huikuri; Carlos Iribarren; Mika Kähönen; Jaakko Kaprio; Sekar Kathiresan; Lambertus Kiemeney; Thomas Kocher; Lenore J Launer; Terho Lehtimäki; Olle Melander; Tom H Mosley; Arthur W Musk; Markku S Nieminen; Christopher J O'Donnell; Claes Ohlsson; Ben Oostra; Lyle J Palmer; Olli Raitakari; Paul M Ridker; John D Rioux; Aila Rissanen; Carlo Rivolta; Heribert Schunkert; Alan R Shuldiner; David S Siscovick; Michael Stumvoll; Anke Tönjes; Jaakko Tuomilehto; Gert-Jan van Ommen; Jorma Viikari; Andrew C Heath; Nicholas G Martin; Grant W Montgomery; Michael A Province; Manfred Kayser; Alice M Arnold; Larry D Atwood; Eric Boerwinkle; Stephen J Chanock; Panos Deloukas; Christian Gieger; Henrik Grönberg; Per Hall; Andrew T Hattersley; Christian Hengstenberg; Wolfgang Hoffman; G Mark Lathrop; Veikko Salomaa; Stefan Schreiber; Manuela Uda; Dawn Waterworth; Alan F Wright; Themistocles L Assimes; Inês Barroso; Albert Hofman; Karen L Mohlke; Dorret I Boomsma; Mark J Caulfield; L Adrienne Cupples; Jeanette Erdmann; Caroline S Fox; Vilmundur Gudnason; Ulf Gyllensten; Tamara B Harris; Richard B Hayes; Marjo-Riitta Jarvelin; Vincent Mooser; Patricia B Munroe; Willem H Ouwehand; Brenda W Penninx; Peter P Pramstaller; Thomas Quertermous; Igor Rudan; Nilesh J Samani; Timothy D Spector; Henry Völzke; Hugh Watkins; James F Wilson; Leif C Groop; Talin Haritunians; Frank B Hu; Robert C Kaplan; Andres Metspalu; Kari E North; David Schlessinger; Nicholas J Wareham; David J Hunter; Jeffrey R O'Connell; David P Strachan; H-Erich Wichmann; Ingrid B Borecki; Cornelia M van Duijn; Eric E Schadt; Unnur Thorsteinsdottir; Leena Peltonen; André G Uitterlinden; Peter M Visscher; Nilanjan Chatterjee; Ruth J F Loos; Michael Boehnke; Mark I McCarthy; Erik Ingelsson; Cecilia M Lindgren; Gonçalo R Abecasis; Kari Stefansson; Timothy M Frayling; Joel N Hirschhorn
Journal: Nature Date: 2010-09-29 Impact factor: 49.962

9. Biological, clinical and population relevance of 95 loci for blood lipids.

Authors: Tanya M Teslovich; Kiran Musunuru; Albert V Smith; Andrew C Edmondson; Ioannis M Stylianou; Masahiro Koseki; James P Pirruccello; Samuli Ripatti; Daniel I Chasman; Cristen J Willer; Christopher T Johansen; Sigrid W Fouchier; Aaron Isaacs; Gina M Peloso; Maja Barbalic; Sally L Ricketts; Joshua C Bis; Yurii S Aulchenko; Gudmar Thorleifsson; Mary F Feitosa; John Chambers; Marju Orho-Melander; Olle Melander; Toby Johnson; Xiaohui Li; Xiuqing Guo; Mingyao Li; Yoon Shin Cho; Min Jin Go; Young Jin Kim; Jong-Young Lee; Taesung Park; Kyunga Kim; Xueling Sim; Rick Twee-Hee Ong; Damien C Croteau-Chonka; Leslie A Lange; Joshua D Smith; Kijoung Song; Jing Hua Zhao; Xin Yuan; Jian'an Luan; Claudia Lamina; Andreas Ziegler; Weihua Zhang; Robert Y L Zee; Alan F Wright; Jacqueline C M Witteman; James F Wilson; Gonneke Willemsen; H-Erich Wichmann; John B Whitfield; Dawn M Waterworth; Nicholas J Wareham; Gérard Waeber; Peter Vollenweider; Benjamin F Voight; Veronique Vitart; Andre G Uitterlinden; Manuela Uda; Jaakko Tuomilehto; John R Thompson; Toshiko Tanaka; Ida Surakka; Heather M Stringham; Tim D Spector; Nicole Soranzo; Johannes H Smit; Juha Sinisalo; Kaisa Silander; Eric J G Sijbrands; Angelo Scuteri; James Scott; David Schlessinger; Serena Sanna; Veikko Salomaa; Juha Saharinen; Chiara Sabatti; Aimo Ruokonen; Igor Rudan; Lynda M Rose; Robert Roberts; Mark Rieder; Bruce M Psaty; Peter P Pramstaller; Irene Pichler; Markus Perola; Brenda W J H Penninx; Nancy L Pedersen; Cristian Pattaro; Alex N Parker; Guillaume Pare; Ben A Oostra; Christopher J O'Donnell; Markku S Nieminen; Deborah A Nickerson; Grant W Montgomery; Thomas Meitinger; Ruth McPherson; Mark I McCarthy; Wendy McArdle; David Masson; Nicholas G Martin; Fabio Marroni; Massimo Mangino; Patrik K E Magnusson; Gavin Lucas; Robert Luben; Ruth J F Loos; Marja-Liisa Lokki; Guillaume Lettre; Claudia Langenberg; Lenore J Launer; Edward G Lakatta; Reijo Laaksonen; Kirsten O Kyvik; Florian Kronenberg; Inke R König; Kay-Tee Khaw; Jaakko Kaprio; Lee M Kaplan; Asa Johansson; Marjo-Riitta Jarvelin; A Cecile J W Janssens; Erik Ingelsson; Wilmar Igl; G Kees Hovingh; Jouke-Jan Hottenga; Albert Hofman; Andrew A Hicks; Christian Hengstenberg; Iris M Heid; Caroline Hayward; Aki S Havulinna; Nicholas D Hastie; Tamara B Harris; Talin Haritunians; Alistair S Hall; Ulf Gyllensten; Candace Guiducci; Leif C Groop; Elena Gonzalez; Christian Gieger; Nelson B Freimer; Luigi Ferrucci; Jeanette Erdmann; Paul Elliott; Kenechi G Ejebe; Angela Döring; Anna F Dominiczak; Serkalem Demissie; Panagiotis Deloukas; Eco J C de Geus; Ulf de Faire; Gabriel Crawford; Francis S Collins; Yii-der I Chen; Mark J Caulfield; Harry Campbell; Noel P Burtt; Lori L Bonnycastle; Dorret I Boomsma; S Matthijs Boekholdt; Richard N Bergman; Inês Barroso; Stefania Bandinelli; Christie M Ballantyne; Themistocles L Assimes; Thomas Quertermous; David Altshuler; Mark Seielstad; Tien Y Wong; E-Shyong Tai; Alan B Feranil; Christopher W Kuzawa; Linda S Adair; Herman A Taylor; Ingrid B Borecki; Stacey B Gabriel; James G Wilson; Hilma Holm; Unnur Thorsteinsdottir; Vilmundur Gudnason; Ronald M Krauss; Karen L Mohlke; Jose M Ordovas; Patricia B Munroe; Jaspal S Kooner; Alan R Tall; Robert A Hegele; John J P Kastelein; Eric E Schadt; Jerome I Rotter; Eric Boerwinkle; David P Strachan; Vincent Mooser; Kari Stefansson; Muredach P Reilly; Nilesh J Samani; Heribert Schunkert; L Adrienne Cupples; Manjinder S Sandhu; Paul M Ridker; Daniel J Rader; Cornelia M van Duijn; Leena Peltonen; Gonçalo R Abecasis; Michael Boehnke; Sekar Kathiresan
Journal: Nature Date: 2010-08-05 Impact factor: 49.962

10. Power and predictive accuracy of polygenic risk scores.

Authors: Frank Dudbridge
Journal: PLoS Genet Date: 2013-03-21 Impact factor: 5.917

19 in total

Review 1. Complex Trait Prediction from Genome Data: Contrasting EBV in Livestock to PRS in Humans: Genomic Prediction.

Authors: Naomi R Wray; Kathryn E Kemper; Benjamin J Hayes; Michael E Goddard; Peter M Visscher
Journal: Genetics Date: 2019-04 Impact factor: 4.562

Review 2. Searching for the human genetic factors standing in the way of universally effective vaccines.

Authors: Alexander J Mentzer; Daniel O'Connor; Andrew J Pollard; Adrian V S Hill
Journal: Philos Trans R Soc Lond B Biol Sci Date: 2015-06-19 Impact factor: 6.237

3. Genome-wide Association of Endophenotypes for Schizophrenia From the Consortium on the Genetics of Schizophrenia (COGS) Study.

Authors: Tiffany A Greenwood; Laura C Lazzeroni; Adam X Maihofer; Neal R Swerdlow; Monica E Calkins; Robert Freedman; Michael F Green; Gregory A Light; Caroline M Nievergelt; Keith H Nuechterlein; Allen D Radant; Larry J Siever; Jeremy M Silverman; William S Stone; Catherine A Sugar; Debby W Tsuang; Ming T Tsuang; Bruce I Turetsky; Ruben C Gur; Raquel E Gur; David L Braff
Journal: JAMA Psychiatry Date: 2019-12-01 Impact factor: 21.596

Review 4. Developing and evaluating polygenic risk prediction models for stratified disease prevention.

Authors: Nilanjan Chatterjee; Jianxin Shi; Montserrat García-Closas
Journal: Nat Rev Genet Date: 2016-05-03 Impact factor: 53.242

5. Systems-Level Analysis of Genetic Variants Reveals Functional and Spatiotemporal Context in Treatment-resistant Schizophrenia.

Authors: Fernanda Talarico; Giovany Oliveira Costa; Vanessa Kiyomi Ota; Marcos Leite Santoro; Cristiano Noto; Ary Gadelha; Rodrigo Bressan; Hatylas Azevedo; Sintia Iole Belangero
Journal: Mol Neurobiol Date: 2022-03-12 Impact factor: 5.590

6. Estimation and partitioning of (co)heritability of inflammatory bowel disease from GWAS and immunochip data.

Authors: Guo-Bo Chen; Sang Hong Lee; Marie-Jo A Brion; Grant W Montgomery; Naomi R Wray; Graham L Radford-Smith; Peter M Visscher
Journal: Hum Mol Genet Date: 2014-04-11 Impact factor: 5.121

7. Assessing the Probability that a Finding Is Genuine for Large-Scale Genetic Association Studies.

Authors: Chia-Ling Kuo; Olga A Vsevolozhskaya; Dmitri V Zaykin
Journal: PLoS One Date: 2015-05-08 Impact factor: 3.240

8. Investigation of glycaemic traits in psychiatric disorders using Mendelian randomisation revealed a causal relationship with anorexia nervosa.

Authors: Danielle M Adams; William R Reay; Michael P Geaghan; Murray J Cairns
Journal: Neuropsychopharmacology Date: 2020-09-13 Impact factor: 7.853

9. Joint analysis of psychiatric disorders increases accuracy of risk prediction for schizophrenia, bipolar disorder, and major depressive disorder.

Authors: Robert Maier; Gerhard Moser; Guo-Bo Chen; Stephan Ripke; William Coryell; James B Potash; William A Scheftner; Jianxin Shi; Myrna M Weissman; Christina M Hultman; Mikael Landén; Douglas F Levinson; Kenneth S Kendler; Jordan W Smoller; Naomi R Wray; S Hong Lee
Journal: Am J Hum Genet Date: 2015-01-29 Impact factor: 11.043

10. Evaluation of Penalized and Nonpenalized Methods for Disease Prediction with Large-Scale Genetic Data.

Authors: Sungho Won; Hosik Choi; Suyeon Park; Juyoung Lee; Changyi Park; Sunghoon Kwon
Journal: Biomed Res Int Date: 2015-08-04 Impact factor: 3.411