Literature DB >> 24896149

Improved imputation quality of low-frequency and rare variants in European samples using the 'Genome of The Netherlands'.

Patrick Deelen¹, Androniki Menelaou², Elisabeth M van Leeuwen³, Alexandros Kanterakis¹, Freerk van Dijk¹, Carolina Medina-Gomez⁴, Laurent C Francioli², Jouke Jan Hottenga⁵, Lennart C Karssen³, Karol Estrada⁶, Eskil Kreiner-Møller⁷, Fernando Rivadeneira⁴, Jessica van Setten², Javier Gutierrez-Achury⁸, Harm-Jan Westra⁸, Lude Franke⁸, David van Enckevort⁹, Martijn Dijkstra¹, Heorhiy Byelas¹, Cornelia M van Duijn¹⁰, Paul I W de Bakker¹¹, Cisca Wijmenga⁸, Morris A Swertz¹.

Abstract

Although genome-wide association studies (GWAS) have identified many common variants associated with complex traits, low-frequency and rare variants have not been interrogated in a comprehensive manner. Imputation from dense reference panels, such as the 1000 Genomes Project (1000G), enables testing of ungenotyped variants for association. Here we present the results of imputation using a large, new population-specific panel: the Genome of The Netherlands (GoNL). We benchmarked the performance of the 1000G and GoNL reference sets by comparing imputation genotypes with 'true' genotypes typed on ImmunoChip in three European populations (Dutch, British, and Italian). GoNL showed significant improvement in the imputation quality for rare variants (MAF 0.05-0.5%) compared with 1000G. In Dutch samples, the mean observed Pearson correlation, r(2), increased from 0.61 to 0.71. We also saw improved imputation accuracy for other European populations (in the British samples, r(2) improved from 0.58 to 0.65, and in the Italians from 0.43 to 0.47). A combined reference set comprising 1000G and GoNL improved the imputation of rare variants even further. The Italian samples benefitted the most from this combined reference (the mean r(2) increased from 0.47 to 0.50). We conclude that the creation of a large population-specific reference is advantageous for imputing rare variants and that a combined reference panel across multiple populations yields the best imputation results.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2014 PMID： 24896149 PMCID： PMC4200431 DOI： 10.1038/ejhg.2014.19

Source DB: PubMed Journal: Eur J Hum Genet ISSN： 1018-4813 Impact factor: 4.246

Introduction

Although genome-wide association studies (GWAS) have been very effective in identifying loci associated with diseases or traits,[1] it has proved difficult to fine-map the association signals to causal variants.[2, 3] To overcome these limitations, there has been increasing interest in the interrogation of less frequent variants, especially given the enrichment of deleterious alleles at low frequencies.[4, 5, 6, 7] There are specialized chips that can assess a larger number of rare variants, like the ImmunoChip[8] or Metabochip,[9] although they do not provide uniform genome-wide coverage. Hence, most investigators will use statistical imputation from SNP arrays in GWAS using dense reference panels. Imputation using a densely typed reference set can be performed to infer untyped variants that can be used to improve the power of a GWAS,[10] and there are numerous examples in which imputation has effectively enriched the results in GWAS.[11, 12] Although most large studies have so far been based on meta-analysis of HapMap-based imputations across cohorts, the primary limitation is that HapMap is essentially restricted to common variation (MAF>5%). Thanks to the sequencing of larger samples, such as 1000G, more complete reference panels are now being assembled, setting off a new wave of meta-analyses. The power of detecting an association in a GWAS is determined by its sample size and effective genome-wide coverage of the included variants, among other things.[13, 14] The effective coverage depends directly on the number and quality of the imputed genotypes.[15] In turn, the quality of the reference panel will depend largely on the number of samples, the quality of the haplotypes, and the number of variants included.[16] The Genome of The Netherlands (GoNL) has the potential to provide a good imputation reference panel. GoNL is a population-based sequencing project, in which 769 Dutch samples were sequenced at, on average, 14 × coverage.[17] In particular, the fact that GoNL sequenced trios (231) or quartets (19) has enabled improved haplotype phasing by using one of the children.[18] The GoNL imputation reference set contains 998 unrelated haplotypes. In this paper, we report a quantitative analysis to assess the quality of imputed genotypes from using both GoNL and 1000G in Dutch and other European populations. We adopted a ‘gold standard' approach using samples genotyped on two distinct platforms, HumanHap550 and ImmunoChip. Hap550 is a commonly used genotyping chip designed to tag as many haplotypes as possible using common variants. ImmunoChip, however, is a fine-mapping chip: it contains a large number of low-frequency and rare variants for a limited number of loci (primarily selected on the basis of loci identified in immune-related traits). Starting from the Hap550-genotyped SNPs, we were able to impute a large number of variants present on ImmunoChip. We then compared these imputed genotypes with the measured (‘gold standard') genotypes on ImmunoChip to quantify the imputation performance. We have such a data set for three European populations: the Dutch, British, and Italians. For each population we used 745 samples genotyped on both platforms. These three populations allowed us to ascertain population-specific differences in the imputation quality of SNPs.

Materials and methods

Genome of the Netherlands

GoNL is a project in which 769 individuals from different Dutch provinces were sequenced at, on average, 14 × coverage.[17] All samples are part of either one of the 231 trios or one of the 19 quartets. The phasing was performed using the trio information,[18] and for the quartets one of the children was used to enhance the phasing. Because of sequence failures of two parents, from different trios, these samples were excluded from the imputation reference set. Instead, from these two trios, we used the haplotype of the child that was not present in the other parent. This resulted in an imputation reference set containing 998 unrelated haplotypes. We used GoNL release 4 for all our analyses (see http://www.nlgenome.nl). The current GoNL release 5 also contains over one million indels but did not change the SNPs.

Benchmarking samples

Samples from a celiac disease patient cohort were selected, as they had been genotyped on both the Hap550 and ImmunoChip.[19] The 745 Dutch and the 745 British samples were all cases, whereas the 745 Italian samples comprised 371 cases and 374 controls. The clustering for the genotype calling of the ImmunoChip data was performed manually in the past, to ensure proper genotyping results. The Hap550 (516 426 SNPs) data were filtered on MAF>1% and HWE P-value>1E-4 for each population separately. The ImmunoChip (113 991 SNPs) data were filtered on MAF>0.05% and HWE P-value of 1E-4. Both data sets are filtered on variants present in both the 1000G reference set and the GoNL reference set. After QC the Dutch, British, and Italian Hap550 data contain 509 888, 509 984, and 510 225 SNPs, respectively. The ImmunoChip data contain in the same order 107 383, 107 212, and 107 611 SNPs.

Combining 1000G and GoNL data

The reference set combining data from 1000G and GoNL was created using the Impute2 option: ‘- -merge_ref_panels'. This merged reference set was written to a file and subsequently used for the benchmarking. As our benchmarking data are filtered for variants present in both reference sets, we did not assess the imputations of variants that are unique to either reference set.

Pre-phasing

The 745 samples for each population were pre-phased using SHAPEIT2.[15] This was done per chromosome using the default settings.

Imputation

The imputations were performed using Impute2 2.3.0.[16] The different populations were imputed separately and in chunks of 5 Mb. For the comparison using an equal number of identical European haplotypes, we performed an imputation using all 379 European 1000G samples and a random selection of 379 GoNL samples. The random selection of GoNL samples was performed stratified on the Dutch provinces. These samples were selected using the Impute2 option: ‘- -exclude_samples_h'. We used MOLGENIS compute[20] to implement the imputation pipeline, run the 8835 imputation chunks in parallel on a PBS compute cluster, and keep track of the 15 imputations (five for each population). All pipelines are available as open source via http://www.molgenis.org/wiki/ComputeStart.

Gold standard method

As stated above, we used samples genotyped on two distinct platforms. We imputed the Hap550 genotypes from these samples and compared the imputed genotypes with the SNPs previously present only in the ImmunoChip data. We used the ImmunoChip data as our ‘gold standard'. The concordance between imputed genotypes and ImmunoChip genotypes was determined by calculating the Pearson correlation r between the imputed dosage and ImmunoChip-observed genotypes. The mean concordances were calculated for three MAF bins: rare (≥0.05% and<0.5%), low-frequency (≥0.5% and<5%), and common (>5%) SNPs. The MAF used to stratify the SNPs into the bins was calculated separately for each population. The results were plotted using R 2.14.2.[21] The significance of the differences between the reference sets was calculated using the Wilcoxon signed-rank test implementation in R.

Principal component analysis

The principal component analysis was performed using the EIGENSOFT 4.2 package.[22] The components were calculated using the European 1000G, GoNL, and the 3 GWAS data sets that we used for benchmarking. Before the components were calculated, all data sets were filtered to include only variants with MAF>5%. A joint data set, featuring variants present in all five data sets, was created. This data set was again filtered for MAF>5% the merged data were also filtered on HWE>1E-4 and a call rate of 95%. This data set was pruned using PLINK 1.07[23] with the ‘—indep-pairwise' option, windows: 1000, step: 5, r threshold: 0.2. The first component explained 0.33% of the variation and the second 0.10%. All subsequent components described less than 0.06%.

Results

We stratified our analysis into three groups: common variants (MAF≥5%), low-frequency variants (MAF 0.5–5%), and rare variants (MAF 0.05–0.5%). We focused mainly on the rare variants, as these are more difficult to impute and most can be gained in terms of imputation quality when using a better reference set. We observed a large increase in the imputation quality of rare variants when using GoNL as the reference compared with 1000G (Figure 1, Table 1). The mean observed Pearson correlation (r) showed a significant increase from 0.61 to 0.71 for Dutch samples (Wilcoxon P-value=7.16E-60). The British and Italian imputations also showed a significant improvement when imputing rare variants, from 0.58 to 0.65 (P=3.70E-35) and from 0.43 to 0.47 (P=2.64E-13), respectively. GoNL also significantly outperformed the 1000G reference set in the imputation of variants with higher MAFs (Supplementary Figures/Supplementary Appendices S1, S2, S3).

Figure 1

Comparison of imputation quality of rare variants using the 1000G data, GoNL, and the combined reference panel.

Table 1

Mean observed r of rare variants

Reference set	Dutch	British	Italian
1000G	0.61	0.58	0.43
GoNL	0.71	0.65	0.47
1000G+GoNL	0.72	0.67	0.50

Abbreviation: GoNL, The Genome of The Netherlands.

Differences in the mean imputation quality between the reference sets was significant for each population (P<0.001).

Using a combined reference set composed of the 1000G and GoNL samples, we could improve the imputation further. The imputation of rare variants using the combined reference in Dutch and British samples showed a small increase in quality compared with GoNL-only imputation (0.02 (P=1.16E-03) and 0.02 (P=2.70E-05), respectively). The Italians benefitted most from the combined reference with an increase of 0.04 (P=3.62E-30) compared with a GoNL-only reference, resulting in a mean concordance for rare variants of 0.5. The differences in imputation quality when using the combined reference set for more frequent alleles were either very small or not significant (Supplementary Figure S1, Supplementary Tables S2 and S3). A striking trend in these results is that the imputation quality of rare variants in the Italian samples is lower than that in Dutch and British samples. The Dutch and Italian samples were genotyped at the same center and have similar call rates, and there were no indications that the genotyping quality of the Italian samples was lower. However, a principal component analysis revealed that the Italian samples were not as well represented by either 1000G or GoNL compared with the Dutch and British GWAS samples used for benchmarking (Figure 2).

Figure 2

Clustering of reference and study samples. PC1 and PC2 reveal three main clusters: Tuscans from Italy (TSI), Finnish (FIN), and a Western European cluster with the CEU (Utah Residents with Northern and Western European ancestry), the GBR (British) and the GoNL samples (a). b shows that most of our GWAS samples clustered in a similar way to the corresponding 1000G/GoNL samples.

We assessed whether the better performance of GoNL compared with 1000G was due to the larger number of European haplotypes in the reference set (998 vs. 758 in 1000G). We did this by performing an imputation using solely the 379 European samples in 1000G and a random subset of 379 GoNL samples. We found that the GoNL subset also significantly outperformed the European 1000G subset (Table 2).

Table 2

Mean observed r of rare variants for reference sets of equal sample size from 1000G and GoNL (all of European descent)

Reference set	Dutch	British	Italian
1000G European	0.59	0.57	0.40
GoNL random subset 379 samples	0.68	0.64	0.45

Abbreviation: GoNL, The Genome of The Netherlands.

Differences in the mean imputation quality between the reference sets of equal sample size was significant for each population (P<0.001).

Our experimental design also allowed us to assess the calibration of the posterior probabilities of the genotypes as they are output by Impute2. We observed that the posterior probabilities were, in general, well calibrated, although we did observe a few deviations for low-frequency and rare variants (Figure 3a). To ascertain whether these deviations in posterior probabilities affect the predicted imputation quality, the Impute2 info metric, we plotted the predicted quality against the observed r. This showed a strong correlation between the predicted and observed quality for common variants and low-frequency variants (correlation of 0.97 and 0.91, respectively; Figures 3b and c). However, the info metric is not as accurate for rare variants, and the correlation with the observed r dropped to 0.70 (Figure 3d). We also observed some discrepancies wherein a near-perfect imputation was predicted while in fact there was poor imputation, and vice versa when assessing rare variants.

Figure 3

Calibration of posterior probabilities. The posterior probabilities were, in general, well calibrated, although there were a few deviations from the expected accuracy (a). For common and low-frequency variants (b and c), we observed a strong correlation (r 0.97 and 0.91, respectively) between the impute2 info metric and the observed r. However, for the rare variants (d), the relation between predicted and observed quality was less profound. We also observed a correlation of 0.70 and several large deviations from the diagonal.

Discussion

We have shown that the new GoNL reference set provides higher downstream imputation accuracy than the 1000G reference set, not only for Dutch samples but also for other European populations studied in this paper. Aside from the increase in the imputation quality of rare variants in Dutch samples from 0.61 (1000G) to 0.71 (GoNL), we also observed an increase in imputation quality in British (0.58–0.65) and Italian (0.43–0.47) samples. We show that GoNL yielded better imputed genotypes for at least these European populations. A combined reference set, of 1000G and GoNL, increased the mean imputation quality of rare variants even further to 0.72, 0.67, and 0.50 for the Dutch, British and Italians, respectively. By selecting an identical number of European haplotypes from 1000G and GoNL, we showed a strong added value for GoNL in all the tested populations, confirming that the trio design of GoNL and the resultant accurate haplotypes aid the downstream imputation quality. We also observed a population-specific added value of GoNL when imputing Dutch samples. The added value (ie mean increase in imputation quality) was largest when comparing GoNL with 1000G in imputing the Dutch samples. Of course, it was already known that a better matched reference set will result in better imputed genotypes;[13] however, the results from this paper were based on low-frequency variants and we show that there is also an inter-European effect of reference sets. It is important to note that we only assessed variants present on the ImmunoChip. Although these variants were not randomly selected, we have no reason to assume that the imputation quality will be positively biased or that they do not represent low-frequency variants in general. The ImmunoChip was made to fine-map loci previously associated with autoimmune diseases using a large number of low-frequency and rare variants. We were encouraged by the observation that the posterior probabilities were, in general, well calibrated with respect to the gold standard genotypes. We observed no adverse effects on the accuracy of the Impute2 info metrics, although for rare variants we did observe a few instances with large deviations between the predicted and observed quality. This is in line with previous observations.[24] This observed inaccuracy also emphasizes the importance of validating associations from imputed genotypes. It was shown earlier that a larger and more diverse reference set can improve the imputation of low-frequency variants.[25] We observed that a combination of 1000G and GoNL showed limited added value for the imputation of rare variants in the Dutch and British samples. It was, however, interesting to observe that the imputation of the Italian samples was improved more by this combined reference panel, leading us to speculate that populations that are poorly represented in the reference panel benefit more from a large and diverse reference set. Despite the limited added value for the Dutch and British data sets, such a large reference set may still be of interest for consortia aiming to impute cohorts of both European and non-European origin. All these cohorts can be imputed using the same combined reference set and then use Impute2 to automatically select the best matching haplotypes.[26] We should note that we were only able to assess variants present in both reference sets, as there are very few variants on the ImmunoChip that are unique to either GoNL or 1000G. Nonetheless, our results show that population-specific reference sets and cosmopolitan panels, such as 1000G, can augment each other. This even holds true for the imputation of samples with ancestry other than those present in the population-specific reference sets, which provides further motivation for international efforts towards large and integrated reference sets.

24 in total

1. Optimal tests for rare variant effects in sequencing association studies.

Authors: Seunggeun Lee; Michael C Wu; Xihong Lin
Journal: Biostatistics Date: 2012-06-14 Impact factor: 5.899

2. Efficiency and power in genetic association studies.

Authors: Paul I W de Bakker; Roman Yelensky; Itsik Pe'er; Stacey B Gabriel; Mark J Daly; David Altshuler
Journal: Nat Genet Date: 2005-10-23 Impact factor: 38.330

3. Principal components analysis corrects for stratification in genome-wide association studies.

Authors: Alkes L Price; Nick J Patterson; Robert M Plenge; Michael E Weinblatt; Nancy A Shadick; David Reich
Journal: Nat Genet Date: 2006-07-23 Impact factor: 38.330

4. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits.

Authors: Lucia A Hindorff; Praveen Sethupathy; Heather A Junkins; Erin M Ramos; Jayashri P Mehta; Francis S Collins; Teri A Manolio
Journal: Proc Natl Acad Sci U S A Date: 2009-05-27 Impact factor: 11.205

5. A rare variant in MYH6 is associated with high risk of sick sinus syndrome.

Authors: Hilma Holm; Daniel F Gudbjartsson; Patrick Sulem; Gisli Masson; Hafdis Th Helgadottir; Carlo Zanon; Olafur Th Magnusson; Agnar Helgason; Jona Saemundsdottir; Arnaldur Gylfason; Hrafnhildur Stefansdottir; Solveig Gretarsdottir; Stefan E Matthiasson; Gu Mundur Thorgeirsson; Aslaug Jonasdottir; Asgeir Sigurdsson; Hreinn Stefansson; Thomas Werge; Thorunn Rafnar; Lambertus A Kiemeney; Babar Parvez; Raafia Muhammad; Dan M Roden; Dawood Darbar; Gudmar Thorleifsson; G Bragi Walters; Augustine Kong; Unnur Thorsteinsdottir; David O Arnar; Kari Stefansson
Journal: Nat Genet Date: 2011-03-06 Impact factor: 38.330

Review 6. Genotype imputation.

Authors: Yun Li; Cristen Willer; Serena Sanna; Gonçalo Abecasis
Journal: Annu Rev Genomics Hum Genet Date: 2009 Impact factor: 8.929

Review 7. Promise and pitfalls of the Immunochip.

Authors: Adrian Cortes; Matthew A Brown
Journal: Arthritis Res Ther Date: 2011-02-01 Impact factor: 5.156

8. Genotype imputation with thousands of genomes.

Authors: Bryan Howie; Jonathan Marchini; Matthew Stephens
Journal: G3 (Bethesda) Date: 2011-11-01 Impact factor: 3.154

9. Accuracy of genome-wide imputation of untyped markers and impacts on statistical power for association studies.

Authors: Ke Hao; Eugene Chudin; Joshua McElwee; Eric E Schadt
Journal: BMC Genet Date: 2009-06-16 Impact factor: 2.797

10. Bayesian refinement of association signals for 14 loci in 3 common diseases.

Authors: Julian B Maller; Gilean McVean; Jake Byrnes; Damjan Vukcevic; Kimmo Palin; Zhan Su; Joanna M M Howson; Adam Auton; Simon Myers; Andrew Morris; Matti Pirinen; Matthew A Brown; Paul R Burton; Mark J Caulfield; Alastair Compston; Martin Farrall; Alistair S Hall; Andrew T Hattersley; Adrian V S Hill; Christopher G Mathew; Marcus Pembrey; Jack Satsangi; Michael R Stratton; Jane Worthington; Nick Craddock; Matthew Hurles; Willem Ouwehand; Miles Parkes; Nazneen Rahman; Audrey Duncanson; John A Todd; Dominic P Kwiatkowski; Nilesh J Samani; Stephen C L Gough; Mark I McCarthy; Panagiotis Deloukas; Peter Donnelly
Journal: Nat Genet Date: 2012-10-28 Impact factor: 38.330

51 in total

1. Population-specific genotype imputations using minimac or IMPUTE2.

Authors: Elisabeth M van Leeuwen; Alexandros Kanterakis; Patrick Deelen; Mathijs V Kattenberg; P Eline Slagboom; Paul I W de Bakker; Cisca Wijmenga; Morris A Swertz; Dorret I Boomsma; Cornelia M van Duijn; Lennart C Karssen; Jouke Jan Hottenga
Journal: Nat Protoc Date: 2015-07-30 Impact factor: 13.491

2. PLD3 variants in population studies.

Authors: Sven J van der Lee; Henne Holstege; Tsz Hang Wong; Johanna Jakobsdottir; Joshua C Bis; Vincent Chouraki; Jeroen G J van Rooij; Megan L Grove; Albert V Smith; Najaf Amin; Seung-Hoan Choi; Alexa S Beiser; Melissa E Garcia; Wilfred F J van IJcken; Yolande A L Pijnenburg; Eva Louwersheimer; Rutger W W Brouwer; Mirjam C G N van den Hout; Edwin Oole; Gudny Eirkisdottir; Daniel Levy; Jerome I Rotter; Valur Emilsson; Christopher J O'Donnell; Thor Aspelund; Andre G Uitterlinden; Lenore J Launer; Albert Hofman; Eric Boerwinkle; Bruce M Psaty; Anita L DeStefano; Philip Scheltens; Sudha Seshadri; John C van Swieten; Vilmundur Gudnason; Wiesje M van der Flier; M Arfan Ikram; Cornelia M van Duijn
Journal: Nature Date: 2015-04-02 Impact factor: 49.962

3. Evaluation of transethnic fine mapping with population-specific and cosmopolitan imputation reference panels in diverse Asian populations.

Authors: Xu Wang; Ching-Yu Cheng; Jiemin Liao; Xueling Sim; Jianjun Liu; Kee-Seng Chia; E-Shyong Tai; Peter Little; Chiea-Chuen Khor; Tin Aung; Tien-Yin Wong; Yik-Ying Teo
Journal: Eur J Hum Genet Date: 2015-07-01 Impact factor: 4.246

4. Genetic diversity of disease-associated loci in Turkish population.

Authors: Sefayet Karaca; Tomris Cesuroglu; Mehmet Karaca; Sema Erge; Renato Polimanti
Journal: J Hum Genet Date: 2015-02-26 Impact factor: 3.172

5. Kinpute: using identity by descent to improve genotype imputation.

Authors: Mark Abney; Aisha ElSherbiny
Journal: Bioinformatics Date: 2019-11-01 Impact factor: 6.937

6. Disease variants alter transcription factor levels and methylation of their binding sites.

Authors: Marc Jan Bonder; René Luijk; Daria V Zhernakova; Matthijs Moed; Patrick Deelen; Martijn Vermaat; Maarten van Iterson; Freerk van Dijk; Michiel van Galen; Jan Bot; Roderick C Slieker; P Mila Jhamai; Michael Verbiest; H Eka D Suchiman; Marijn Verkerk; Ruud van der Breggen; Jeroen van Rooij; Nico Lakenberg; Wibowo Arindrarto; Szymon M Kielbasa; Iris Jonkers; Peter van 't Hof; Irene Nooren; Marian Beekman; Joris Deelen; Diana van Heemst; Alexandra Zhernakova; Ettje F Tigchelaar; Morris A Swertz; Albert Hofman; André G Uitterlinden; René Pool; Jenny van Dongen; Jouke J Hottenga; Coen D A Stehouwer; Carla J H van der Kallen; Casper G Schalkwijk; Leonard H van den Berg; Erik W van Zwet; Hailiang Mei; Yang Li; Mathieu Lemire; Thomas J Hudson; P Eline Slagboom; Cisca Wijmenga; Jan H Veldink; Marleen M J van Greevenbroek; Cornelia M van Duijn; Dorret I Boomsma; Aaron Isaacs; Rick Jansen; Joyce B J van Meurs; Peter A C 't Hoen; Lude Franke; Bastiaan T Heijmans
Journal: Nat Genet Date: 2016-12-05 Impact factor: 38.330

7. Identification of context-dependent expression quantitative trait loci in whole blood.

Authors: Daria V Zhernakova; Patrick Deelen; Martijn Vermaat; Maarten van Iterson; Michiel van Galen; Wibowo Arindrarto; Peter van 't Hof; Hailiang Mei; Freerk van Dijk; Harm-Jan Westra; Marc Jan Bonder; Jeroen van Rooij; Marijn Verkerk; P Mila Jhamai; Matthijs Moed; Szymon M Kielbasa; Jan Bot; Irene Nooren; René Pool; Jenny van Dongen; Jouke J Hottenga; Coen D A Stehouwer; Carla J H van der Kallen; Casper G Schalkwijk; Alexandra Zhernakova; Yang Li; Ettje F Tigchelaar; Niek de Klein; Marian Beekman; Joris Deelen; Diana van Heemst; Leonard H van den Berg; Albert Hofman; André G Uitterlinden; Marleen M J van Greevenbroek; Jan H Veldink; Dorret I Boomsma; Cornelia M van Duijn; Cisca Wijmenga; P Eline Slagboom; Morris A Swertz; Aaron Isaacs; Joyce B J van Meurs; Rick Jansen; Bastiaan T Heijmans; Peter A C 't Hoen; Lude Franke
Journal: Nat Genet Date: 2016-12-05 Impact factor: 38.330

Review 8. The Future of Genomic Studies Must Be Globally Representative: Perspectives from PAGE.

Authors: Stephanie A Bien; Genevieve L Wojcik; Chani J Hodonsky; Christopher R Gignoux; Iona Cheng; Tara C Matise; Ulrike Peters; Eimear E Kenny; Kari E North
Journal: Annu Rev Genomics Hum Genet Date: 2019-04-12 Impact factor: 8.929

9. Improving power of association tests using multiple sets of imputed genotypes from distributed reference panels.

Authors: Wei Zhou; Lars G Fritsche; Sayantan Das; He Zhang; Jonas B Nielsen; Oddgeir L Holmen; Jin Chen; Maoxuan Lin; Maiken B Elvestad; Kristian Hveem; Goncalo R Abecasis; Hyun Min Kang; Cristen J Willer
Journal: Genet Epidemiol Date: 2017-09-01 Impact factor: 2.135

10. Molgenis-impute: imputation pipeline in a box.

Authors: Alexandros Kanterakis; Patrick Deelen; Freerk van Dijk; Heorhiy Byelas; Martijn Dijkstra; Morris A Swertz
Journal: BMC Res Notes Date: 2015-08-19