Literature DB >> 26865700

Efficient Software for Multi-marker, Region-Based Analysis of GWAS Data.

Jaleal S Sanjak¹, Anthony D Long², Kevin R Thornton².

Abstract

Genome-wide association studies (GWAS) have associated many single variants with complex disease, yet the better part of heritable complex disease risk remains unexplained. Analytical tools designed to work under specific population genetic models are needed. Rare variants are increasingly shown to be important in human complex disease, but most existing GWAS data do not cover rare variants. Explicit population genetic models predict that genes contributing to complex traits and experiencing recurrent, unconditionally deleterious, mutation will harbor multiple rare, causative mutations of subtle effect. It is difficult to identify genes harboring rare variants of large effect that contribute to complex disease risk via the single marker association tests typically used in GWAS. Gene/region-based association tests may have the power detect associations by combining information from multiple markers, but have yielded limited success in practice. This is partially because many methods have not been widely applied. Here, we empirically demonstrate the utility of a procedure based on the rank truncated product (RTP) method, filtered to reduce the effects of linkage disequilibrium. We apply the procedure to the Wellcome Trust Case Control Consortium (WTCCC) data set, and uncover previously unidentified associations, some of which have been replicated in much larger studies. We show that, in the absence of significant rare variant coverage, RTP based methods still have the power to detect associated genes. We recommend that RTP-based methods be applied to all existing GWAS data to maximize the usefulness of those data. For this, we provide efficient software implementing our procedure.

Entities: Chemical

Keywords: GWAS; gene-based rare variants

Mesh：

Substances：
Genetic Markers

Year: 2016 PMID： 26865700 PMCID： PMC4825638 DOI： 10.1534/g3.115.026013

Source DB: PubMed Journal: G3 (Bethesda) ISSN： 2160-1836 Impact factor: 3.154

Revealing the genetic basis of common human diseases, such as diabetes and heart disease, remains a central challenge in human genetics. Family-based and twin-based studies estimate that the genetic component of disease risk is typically large. Genome-wide association studies (GWAS) have identified many genetic variants associated with complex human diseases (Welter ), yet the heritability explained by specific statistically significant variants remains small in comparison to the total heritability estimates (Manolio ; Visscher ). Various hypotheses explaining the ”missing heritability problem” exist (Manolio ; Visscher ; Gibson 2012; Robinson ). Gene-by-gene, gene-by-environment, and other complex epistatic interactions might create statistical challenges for the detection of causal variants (Eichler ; Wei ), or might inflate total heritability estimates (Zuk ). The missing heritability could be attributable to many common well-tagged variants that do not reach statistical significance because of their miniscule effect sizes (Fisher 1930; Visscher ). Rare variants with large effects (RALE) might drive heritability and escape detection because they are not well-tagged by current genotyping methods (McClellan and King 2010; Cirulli and Goldstein 2010). Quantifying the roles of these nonmutually exclusive hypotheses is important for the design of future studies, and the development of new analytical tools (Visscher ). We still do not know exactly how mutational effect sizes underlying specific diseases map onto the human site-frequency spectrum. However, it is becoming increasingly clear that rare variants are an important contributor to the genetic basis of complex diseases (Auer ; Prescott ; Wessel and Goodarzi 2015; Purcell ; Cruchaga ; Huyghe ; Nelson ; Johansen ). The RALE hypothesis is particularly appealing to some because it is a prediction that arises naturally from population-genetic models of mutation-selection balance (Haldane 1927). Specifically, it arises from a model in which equilibrium allele frequencies and phenotypic effect sizes both reflect a balance between two things: recurrent unconditionally deleterious mutations occurring in a disease gene, and their elimination by natural selection (Pritchard 2001). A previous simulation study (Thornton ) investigated a novel model where standing quantitative genetic variation in complex disease genes of large effect is maintained via partially noncomplementing mutations. An important prediction of this model is that a gene region can harbor several, individually rare, variants which all contribute to a complex disease phenotype. Such allelic heterogeneity is predicted to pose complications for genome wide association studies (McClellan and King 2010). In particular, we know that single-marker association tests do not have sufficient statistical power in these cases (Johnston ; Sham and Purcell 2014; Spencer ). Further, associations under this model are a mixture of two different types (Thornton ). First, associations may be due to tagging a causal marker whose effect size is small, implying a sufficiently small effect on fitness, allowing the mutation to reach intermediate frequency ( in the population). The second class of association is due to noncausative mutations in linkage disequilibrium (LD) with causal markers. These “tagged” associations tend to be rare, and of relatively large effect (Thornton ). Under this model, “missing heritability” arises from a combination of allelic heterogeneity, and a lack of power to identify risk variants. Under the model of noncomplementing mutations, regions harboring risk alleles show a statistical signature of a large number of markers with single-marker p-values approaching, but still below, a genome-wide significance threshold (Thornton ). These latter authors further showed that, under this model, the excess of significant markers (ESM) test, a permutation-based regional association test, had more power to detect a causal gene region in typical GWAS data than single marker methods, and many popular region-based tests (Thornton ), even for GWAS containing only common markers (). Although the test statistic of the ESM test is inspired by order statistics, under the permutation procedure for evaluating statistical significance, it is equivalent to the rank truncated product (RTP) of p-values (Dudbridge and Koeleman 2003). This equivalence was not initially recognized by Thornton ). Multiple variations on the RTP exist to address issues related to correlation between p-values (De la Cruz ), and the need to specify a truncation threshold (Yu ). Although the RTP test has been used recently to obtain pathway- or gene-level associations in GWAS, and other, genomic applications (Meyer ; Brenner ; Ahsan ; Li ; Lee ; Arem ; Lai ), it is not widely used. Here, we demonstrate the utility of mining existing datasets with an RTP approach, which we call the ESM test from here on, and provide an efficient implementation that can perform genome-wide scans without the need to restrict only to coding regions. GWAS data do not have sufficient coverage of rare variants for direct analysis, but the ESM test is a powerful tool for extracting useful information despite this fact. Here we perform an empirical analysis of the performance of the ESM test on the Wellcome Trust Case Control Consortium (WTCCC) GWAS data set (Wellcome ). We chose this dataset to determine the empirical efficacy of the ESM test because the dataset is well-characterized and easy to obtain. In addition, the choice of a dataset without substantial rare variant coverage, allows us to show that the ESM test has the power to detect the slight differences in allele frequencies between cases and control at common neutral markers, which is predicted by RALE models. We discover four novel gene regions that contribute to complex disease variation not detected in the original study, and propose that the ESM test is even better-suited to data sets that employ more modern, denser, SNP chips.

Materials and Methods

Dataset

Data were obtained from the Wellcome Trust Case Control Consortium (http://www.wtccc.org.uk/), and are as described in Wellcome ). Briefly, we obtained cases for each of seven diseases, and a set of shared controls typed on an Affymetrix 500K SNP chip. Diseases included in the dataset are Bipolar Disorder (BD), Coronary Artery Disease (CAD), Hypertension (HT), Chron’s disease (IBD), Rheumatoid Arthritis (RA), Type 1 Diabetes(T1D), and Type 2 Diabetes (T2D). Case and control samples are obtained from across Great Britain. Control samples contain two subgroups: individuals come from the 1958 British Birth Cohort (1958BC), and belong to the national UK Blood Services donor pool (NBS).

Data preprocessing

The raw WTCCC data were formatted for use in PLINK 1.90a (Purcell ). Single nucleotide polymorphisms (SNPs) listed in the WTCCC genotype file by their Affymetrix identification were translated into RefSNP (rsID) with the Affymetrix chip annotations. The SNP identifications and chromosome positions were updated to the most recent dbSNP Build 144. The SNP and individual exclusions lists provided were applied, and only genotyping calls with quality score over 0.9 were included.

Basic association and permutation

The basic single marker association test was executed with the PLINK 1.90a command –assoc. A total of N permuted single marker p-values are obtained from PLINK!1.90a by specifying –mperm = N. We take permutations, such that the resolution of our permutation p-value is , which can allow us to establish a region as genome-wide significant below a marginal p-value threshold of . We stored the observed association p-values, the permuted association p-values, and the between each marker (from plink –ld command) into HDF5 file format for use in the ESM test.

Excess of significant markers test

We implement the ESM test as described in Thornton ). The test is a permutation based variation of rank truncated Fisher’s combined p-value method using a null hypothesis based on order statistics. The test statistic is the sum of the differences between the observed and expected . However the expected value under the null is the same for each permutation and thus the statistic is equivalent to the sum of observed , i.e., the RTP. For a set of m markers, the expected p-value, under the null model of no association, of the most significant marker is . Let be a vector of length, m, containing the observed , sorted in order of decreasing significance, from the single marker association test. Then the ESM test statistic is defined to be:For each region, we calculate the ESM test statistic based for the observed data, and for each permutation of the data. For a given region, let the set of ESM test statistics be ESM, such that is the observed value and the rest are calculated from permuted data. Then, the p-value for that region is:where,We performed the ESM test using a two-stage sliding window approach. Using 100-kb windows, we performed a genome scan with a jump size of 50 kb, with . The effect of changing m was explored in Thornton ), and the choice of 25 was based on average SNP density in the WTCCC data. Within each region, we filtered markers based on LD, taking only SNPs whose was less than 0.2; always removing the SNP with the greater chromosomal position. While choosing this particular LD pruning rule is arbitrary, it prevents the introduction of bias due to selecting SNPs based on association significance. Regions that contained a marginally significant hit, with ESM p-values less than 1e–04, were rescanned using a finer (1 kb) jump size. The code for implementing the test can obtained at from github: https://github.com/ThorntonLab/ESMtest. Contiguous genomic regions that contain windows reaching genome-wide significance at 1e–6 were taken and explored for functional annotations. This significance threshold results in a predicted genome-wide Type-1 error rate of approximately 0.06; the mean (across diseases) number of total windows analyzed is 58,724, and, thus, the idealized type-1 error rate is . However, this estimate is quite conservative because the windows are spatially auto-correlated across the genome, making the effective number of tests performed much lower than the number of windows analyzed.

Intersection with other GWAS data

Significant regions were initially queried against the NHGRI GWAS database (http://www.ebi.ac.uk/gwas/). Regions were classified as being potentially novel if there were no significant SNPs in the NHGRI GWAS database for the specific disease whose genomic position fell within the boundaries of the region. Regions containing significant SNPs in the NHGRI GWAS database that were not contributed by the Wellcome Trust were also taken for further analysis. The regions were queried against gene and transcript annotations in human reference genome GRCh38 using the R package biomaRt (Durinck , 2009). The resulting gene and transcript annotations were manually curated for novelty and functional relevance.

Data availability

Data were obtained from the Wellcome Trust Case Control Consortium (http://www.wtccc.org.uk/).

Results

We implement the ESM test as a sliding-window genome-wide scan for significant regions; we use 100 kb windows and 2 million permutations to reach genome-wide significance at an empirical 1e-6. Region-/set-based methods result in far fewer tests than single-marker methods. By analyzing 60,000 windows with a marginal 1e-6, our genome-wide type 1 error rate was roughly 0.06; this estimate is conservative because the windows are not independent, and thus we effectively performed fewer tests than is suggested by the number of windows analyzed. Permutation procedures on genomic datasets are notoriously computationally expensive, and are thus typically avoided, despite their appealing statistical properties. With this in mind, we developed an efficient and freely available computational pipeline to implement the ESM test, which relies on new software and PLINK 1.90a (Purcell ) (see Materials and Methods). The pipeline leverages PLINK’s fast permutation procedures for single marker association tests, stores the data in I/O optimized HDF5 file format, and performs the test. Our analysis recapitulates most, but not all, of the associations established in the standard analysis of Wellcome ) and finds new associations demonstrating that the ESM test is an excellent candidate for application in addition to standard methods.

Overlap between the ESM test and standard analysis

The majority of the regions found in Wellcome ) that showed strong associations with case-control status were also significant under the ESM test. In Wellcome ), the standard 1–df test resulted in 21 regions showing strong association signals (5e-7). Supplemental Material, Table S1 shows that 18 of these regions also have an ESM test 1e–6. Of the three regions that do not reach genome-wide significance under the ESM test, two have p-values between 1e–4 and 1e–6 (Table S2). In particular, multiple windows containing rs2542151, the main SNP reported for region chr18:12.77–12.92(Mb) in association with inflammatory bowel disease, reach ESM 9e–6. A third SNP, rs420259, in region chr16:23.38-23.7(Mb) reported in association with bipolar disorder by Wellcome ) did not replicate in other studies (Tung ), and the region does not show strong association via the ESM test. Applying the SKAT (Wu ; Lee ) test to the same genomic windows results in less overlap with the WTCCC results (Table S4 and Table S5). Some of the regions not deemed significant by SKAT have been validated in other studies and can be viewed as false negatives. The ESM test has fewer false negatives. Because SKAT is not a permutation-based test, it is orders of magnitude faster computationally. However, our concern should focus primarily on getting better answers within the constraints of what is tractable. The ESM test is computationally feasible (Figure S2), and is shown here to give useful results. When we look at the overlaps and differences between the results of the ESM test and the single marker test, we make two important observations. First, the ESM test has the power to detect genomic regions in association with disease status. Second, because there are regions that are only identified by either the ESM test, or the single marker test, we should view these methods as complementary. The second point is conceptually important, but computationally trivial because one has to do a single marker test to serve as the input to the ESM test. The suggested workflow is essentially as follows: run the single marker test, run the ESM test, analyze both results separately, and then observe their union and intersection.

Strong associations replicated in independent datasets

Table 1 shows that the ESM test identifies four genomic regions that were not significant in the original WTCCC single-marker based analysis. Three out of four of these regions have since been associated with disease statuses in independent studies published in the years following the introduction of the WTCCC analysis (Table 1). These subsequent independent studies all leveraged datasets employing larger case/control panels and/or more densely genotyped SNPs than were originally used in Wellcome ). Published simulations suggest that the ESM test should accrue additional benefits when used on datasets with improved genotyping (see Figure 3 in Thornton ). In contrast, applying SKAT (Wu ; Lee ) to these same data and genomic windows was less promising. Although SKAT finds three significant regions that are not significant with a single marker test (Table S3), only two have support in studies, and no completely novel candidate genes are found. The number of new results is not significantly different between the ESM test and SKAT, but there does appear to be a qualitative difference in the level of plausibility. However, at present we cannot rule out differences in optimal approach to partitioning the genome, or differences in the type of signal detected in explaining the observed differences in ESM and SKAT results. Overall, three of the four novel associations identified using the ESM test are replicated, providing empirical support that the ESM test can detect novel true positive associations, even in relatively small data sets. We briefly describe the known biological significance of these three genomic regions below.

Table 1

New Associations: regions with ESM test 1e-6 with no corresponding hit from Wellcome ) are reported below

Disease	Chr	Position (Mb)	Gene Region	Source
CAD	7	80.78–80.88	SEMA3C	This analysis
CAD	7	129.993–130.123	ZC3HC1/KLHDC10	(Erbilgin et al. 2013)
T1D, RA	22	37.096–37.203	IL2RB	(Plagnol et al. 2011; Eleftherohorinou et al. 2011; Okada et al. 2014; Chimusa et al. 2014)
IBD	1	172.872–172.983	FASLG/TNFSF18	(Franke et al. 2010; Jostins et al. 2012; Dubois et al. 2010)

Three out of four regions contain corresponding hits in the NHGRI GWAS database not due to Wellcome ) or were otherwise previously indicated in the particular disease as cited in the source column above. One region is novel based on our analysis, and overlaps with a biologically plausible gene SEMA3C. CAD, coronary artery disease; T1D, type 1 diabetes; RA, rheumatoid arthritis; IBD, Chron’s disease. The region chr7:129.99-130.12(Mb) is strongly associated with coronary artery disease (CAD) (Figure 1 and Table 1). This region overlaps two genes: ZC3HC1 and KLHDC10. A missense mutation in ZC3HC1, which is also a cis-eQTL for KLHDC10, has been previously associated with CAD (Erbilgin ). Neither gene currently has a clearly understood role in the etiology of CAD. The region chr22:37.09-37.21(Mb), containing IL2RB, is associated with type-1 diabetes (T1D) (Figure 1 and Table 1). This region was nominally associated with rheumatoid arthritis (RA) by the WTCCC, but not with T1D. IL2RB has been associated with both diseases in multiple studies (Plagnol ; Eleftherohorinou ; Okada ; Chimusa ). Epidemiological associations with immune related genes like IL2RB have motivated many important basic and clinical research studies (Pozzilli ). Finally, we find an intergenic region, chr1:172.87-172.99(Mb), which contains SNPs previously associated with inflammatory bowel disease (IBD) (Franke ; Jostins ) and Celiac Disease (Dubois et al. 2010), to be associated with IBD (Figure 1 and Table 1). Both nearby genes, TNFSF18 and FASLG, are part of the immunologically important TNF superfamily. The presence of putatively active regulatory elements within this associated region (Figure S1), supports the association between variation in regulatory sequences and common diseases (Maurano ; Mathelier ).

Figure 1

Manhattan plots with ESM significant regions highlighted. Single marker p-values vs. chromosomal position (BP) for all seven diseases analyzed, with SNPs in ESM significant (ESM 1e–6) regions highlighted in green. Horizontal lines are placed at to illustrate the typical single marker genome-wide significance threshold. SNP clusters that are highlighted in green, but do not contain a single genome-wide significant SNP, are reported as novel.

Novel association: SEMA3C

The ESM test finds one additional novel region, not shown to be of genome-wide significance in any study to date, showing strong association with CAD: chr7:8.08-8.09(Mb) (Table 1 and Figure 1). The only known protein-coding gene in this region is SEMA3C (Figure 2). A single SNP (rs4236644) in SEMA3C reached marginal significance (2e–6) in a meta-analysis of GWAS for total serum bilirubin levels (Johnson ). SEMA3C is a secreted neurovascular guiding molecule that has a number of developmental functions, and plays a role in cardiovascular development during embyrogenesis (Püschel ; Feiner ). Certain congenital heart diseases are attributed to disregulation of SEMA3C, and its associated receptor PLXNA2 (Kodo ). SEMA3C is also an adipokine indicated in extracellular changes during white adipose tissue hypertrophy in human obesity (Mejhert ). In total, SEMA3C is a plausible candidate gene driving the observed ESM signal. However, we should note that the nearby (0.5 Mb away) gene CD36 is associated with heart-disease-related traits, including response to blood lipid drugs (Frazier-Wood ), platelet count, and HDL cholesterol in African Americans (Qayyum ; Coram ). Although Figure 2 demonstrates lower support for CD36, its presence could be driving the association with SEMA3C through long-range LD. Alternatively, the presence of CD36 might reflect the typical spatial clustering of functionally related genes found in many organisms (Hurst ). Overall, the association of SEMA3C with CAD is consistent with its known physiological function in the development of the heart, and thus makes it an intriguing candidate for future studies.

Figure 2

Region plot for SEMA3C hit. The top panel contains single marker (black points) and ESM test (red triangles) -values for coronary artery disease vs. chromosomal position in the region chr7:80-82 (Mb). Each ESM test point is plotted at the midpoint of a genomic window to which that -values corresponds. The single 100 kb ESM significant (ESM 1e–6) region chr7:80.78-80.88 (Mb) is demarcated by vertical dashed lines, and the horizontal lines are placed at to indicate the ESM test significance threshold. The middle panel contains the recombination rate in cM/Mb obtained from HapMap througout the same region. The lower panel shows the refseq gene UCSC genome browser track for the region.

Discussion

The power of the ESM test is highlighted by the fact that it can identify novel, biologically plausible associations in an approximately 10-yr-old data set that has been highly studied. We provide open-source software implementing the test, which can be applied to GWAS data in PLINK .ped/.bed file format. As a caveat, although the test is simple, performing millions of permutations on GWAS data sets is computationally intensive. Individual-level genotype data are a requirement of the ESM test. The test cannot be applied to summary statistics from case/control studies. If it is applied to data with greater SNP coverage across the genome, a finer-scale sliding window may be desirable, requiring more permutations to keep Type-1 errors low. Nevertheless, simulations suggest that the power of the ESM test will increase significantly when the test is applied to data sets that have employed more modern higher density SNP chips (Thornton ). False positives due to LD between markers is often a concern for region-based analysis, although it has been shown that using permutation does adequately address the impact of LD on variations of Fisher’s combined p-value (Moskvina ; Alves and Yu 2014). However, when SNP pruning is applied, as it is here, to reduce the maximum pairwise correlation to 0.2, the effect is predicted to be quite insignificant (Alves and Yu 2014). This agrees with the observation from Thornton ) that the ESM test did not result in any false positives under neutral simulations. We find that using rank truncated product methods in conjunction with single-marker analysis yields an approximately 20% gain in power over single-marker analysis alone, as illustrated by the finding of four new results on top of the preexisting 21 results from the standard method. It is clear to see the potential benefit of applying the ESM test in this way to all of the existing GWAS data. Given the extent of GWAS data currently in existence, it is conceivable that a broad application of the ESM test would establish thousands of new associations. An additional benefit of a broad application the ESM test is the opportunity to validate hits in new datasets with older ones, as we demonstrated here. A key limitation of region/SNP-set based tests in general, including rank truncated product methods, is that one cannot simply validate a single or small set of markers in a second panel. It is instead necessary to do deep genotyping of a candidate region in an independent panel in order to gain a perspective on the genetic variation present in the associated region. A corollary is that the lack of simple single SNP markers makes the estimation of effect sizes and variance explained by a detected gene region difficult; this problem should be a focus of future studies. Using existing data, rank truncated product methods have power to detect new associations between genomic regions and disease. Notably, the development of more powerful region-based tests seems likely. The ESM test was designed to detect an association signal in case/control panels under a particular gene action model, and a small range of population genetic scenarios. Recent work (Moutsianas ) demonstrates that predictions from simulation studies regarding performance of region-based tests are impacted by various model details. Thus, future research should focus on the behavior of association tests under various models of gene action and demography.

69 in total

1. Permutation-based approaches do not adequately allow for linkage disequilibrium in gene-wide multi-locus association analysis.

Authors: Valentina Moskvina; Karl M Schmidt; Alexey Vedernikov; Michael J Owen; Nicholas Craddock; Peter Holmans; Michael C O'Donovan
Journal: Eur J Hum Genet Date: 2012-02-08 Impact factor: 4.246

Review 2. Heritability in the genomics era--concepts and misconceptions.

Authors: Peter M Visscher; William G Hill; Naomi R Wray
Journal: Nat Rev Genet Date: 2008-03-04 Impact factor: 53.242

3. PLINK: a tool set for whole-genome association and population-based linkage analyses.

Authors: Shaun Purcell; Benjamin Neale; Kathe Todd-Brown; Lori Thomas; Manuel A R Ferreira; David Bender; Julian Maller; Pamela Sklar; Paul I W de Bakker; Mark J Daly; Pak C Sham
Journal: Am J Hum Genet Date: 2007-07-25 Impact factor: 11.025

4. Genome-wide association meta-analysis for total serum bilirubin levels.

Authors: Andrew D Johnson; Maryam Kavousi; Albert V Smith; Ming-Huei Chen; Abbas Dehghan; Thor Aspelund; Jing-Ping Lin; Cornelia M van Duijn; Tamara B Harris; L Adrienne Cupples; Andre G Uitterlinden; Lenore Launer; Albert Hofman; Fernando Rivadeneira; Bruno Stricker; Qiong Yang; Christopher J O'Donnell; Vilmundur Gudnason; Jacqueline C Witteman
Journal: Hum Mol Genet Date: 2009-05-04 Impact factor: 6.150

5. Systematic localization of common disease-associated variation in regulatory DNA.

Authors: Matthew T Maurano; Richard Humbert; Eric Rynes; Robert E Thurman; Eric Haugen; Hao Wang; Alex P Reynolds; Richard Sandstrom; Hongzhu Qu; Jennifer Brody; Anthony Shafer; Fidencio Neri; Kristen Lee; Tanya Kutyavin; Sandra Stehling-Sun; Audra K Johnson; Theresa K Canfield; Erika Giste; Morgan Diegel; Daniel Bates; R Scott Hansen; Shane Neph; Peter J Sabo; Shelly Heimfeld; Antony Raubitschek; Steven Ziegler; Chris Cotsapas; Nona Sotoodehnia; Ian Glass; Shamil R Sunyaev; Rajinder Kaul; John A Stamatoyannopoulos
Journal: Science Date: 2012-09-05 Impact factor: 47.728

6. Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies.

Authors: Seunggeun Lee; Mary J Emond; Michael J Bamshad; Kathleen C Barnes; Mark J Rieder; Deborah A Nickerson; David C Christiani; Mark M Wurfel; Xihong Lin
Journal: Am J Hum Genet Date: 2012-08-02 Impact factor: 11.025

Review 7. Evidence-based psychiatric genetics, AKA the false dichotomy between common and rare variant hypotheses.

Authors: P M Visscher; M E Goddard; E M Derks; N R Wray
Journal: Mol Psychiatry Date: 2011-06-14 Impact factor: 15.992

8. Confirmation of multiple Crohn's disease susceptibility loci in a large Dutch-Belgian cohort.

Authors: Rinse K Weersma; Pieter C F Stokkers; Isabelle Cleynen; Simone C S Wolfkamp; Liesbet Henckaerts; Stefan Schreiber; Gerard Dijkstra; Andre Franke; Ilja M Nolte; Paul Rutgeerts; Cisca Wijmenga; Séverine Vermeire
Journal: Am J Gastroenterol Date: 2009-01-27 Impact factor: 10.864

9. Robust associations of four new chromosome regions from genome-wide analyses of type 1 diabetes.

Authors: John A Todd; Neil M Walker; Jason D Cooper; Deborah J Smyth; Kate Downes; Vincent Plagnol; Rebecca Bailey; Sergey Nejentsev; Sarah F Field; Felicity Payne; Christopher E Lowe; Jeffrey S Szeszko; Jason P Hafler; Lauren Zeitels; Jennie H M Yang; Adrian Vella; Sarah Nutland; Helen E Stevens; Helen Schuilenburg; Gillian Coleman; Meeta Maisuria; William Meadows; Luc J Smink; Barry Healy; Oliver S Burren; Alex A C Lam; Nigel R Ovington; James Allen; Ellen Adlem; Hin-Tak Leung; Chris Wallace; Joanna M M Howson; Cristian Guja; Constantin Ionescu-Tîrgovişte; Matthew J Simmonds; Joanne M Heward; Stephen C L Gough; David B Dunger; Linda S Wicker; David G Clayton
Journal: Nat Genet Date: 2007-06-06 Impact factor: 38.330

10. Designing genome-wide association studies: sample size, power, imputation, and the choice of genotyping chip.

Authors: Chris C A Spencer; Zhan Su; Peter Donnelly; Jonathan Marchini
Journal: PLoS Genet Date: 2009-05-15 Impact factor: 5.917

2 in total

1. Genetic Architecture of Gene Expression in European and African Americans: An eQTL Mapping Study in GENOA.

Authors: Lulu Shang; Jennifer A Smith; Wei Zhao; Minjung Kho; Stephen T Turner; Thomas H Mosley; Sharon L R Kardia; Xiang Zhou
Journal: Am J Hum Genet Date: 2020-03-26 Impact factor: 11.025

2. A Model of Compound Heterozygous, Loss-of-Function Alleles Is Broadly Consistent with Observations from Complex-Disease GWAS Datasets.

Authors: Jaleal S Sanjak; Anthony D Long; Kevin R Thornton
Journal: PLoS Genet Date: 2017-01-19 Impact factor: 5.917

2 in total