Literature DB >> 28401899

Improved imputation accuracy of rare and low-frequency variants using population-specific high-coverage WGS-based imputation reference panel.

Mario Mitt^1,2, Mart Kals^1,3, Kalle Pärn^1,4, Stacey B Gabriel⁵, Eric S Lander⁵, Aarno Palotie^4,5, Samuli Ripatti⁴, Andrew P Morris^1,6, Andres Metspalu^1,2, Tõnu Esko^1,5, Reedik Mägi¹, Priit Palta^1,4.

Abstract

Genetic imputation is a cost-efficient way to improve the power and resolution of genome-wide association (GWA) studies. Current publicly accessible imputation reference panels accurately predict genotypes for common variants with minor allele frequency (MAF)≥5% and low-frequency variants (0.5≤MAF<5%) across diverse populations, but the imputation of rare variation (MAF<0.5%) is still rather limited. In the current study, we evaluate imputation accuracy achieved with reference panels from diverse populations with a population-specific high-coverage (30 ×) whole-genome sequencing (WGS) based reference panel, comprising of 2244 Estonian individuals (0.25% of adult Estonians). Although the Estonian-specific panel contains fewer haplotypes and variants, the imputation confidence and accuracy of imputed low-frequency and rare variants was significantly higher. The results indicate the utility of population-specific reference panels for human genetic studies.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2017 PMID： 28401899 PMCID： PMC5520064 DOI： 10.1038/ejhg.2017.51

Source DB: PubMed Journal: Eur J Hum Genet ISSN： 1018-4813 Impact factor: 4.246

Introduction

Genotype imputation is a method for statistically inferring untyped genotypes in a sample of partially genotyped individuals, based on a reference panel of individuals who have been more densely genotyped or sequenced. Imputation methods attempt to identify haplotype sharing between individuals in the sample and in an imputation reference panel (IRP), and use this information to infer the alleles at untyped loci in the sample.[1] Imputation allows geneticists to study variants that have not been directly genotyped in a sample and thereby to increase power and resolution of genome-wide association studies (GWAS). Imputation is particularly useful for combining association results across studies that used different genotyping arrays[2] and facilitate fine-mapping to localise association signals by considering all genetic variants in a region. Publicly available IRPs from the International HapMap Project[3, 4] and 1000 Genomes Project (1000G)[5] have been instrumental to the discovery of thousands of loci affecting diseases and traits in individual GWAS and collaborative meta-analyses. The first wave of studies mostly used the HapMap II IRP, which used microarray-based genotypes from 270 individuals at 3.1 million (M) variants.[6, 7, 8, 9, 10] Later studies used IRPs based on the 1000G project, which performed whole-genome sequencing (WGS) on a diverse set of populations, with 2504 individuals and up to 84.4 M variants.[11, 12, 13, 14, 15, 16] Although the latter IRP allows robust imputation of common variants (minor allele frequency (MAF)≥5%) and low-frequency variants (0.5≤MAF<5%)[5] it has only limited imputation accuracy for rare (MAF<0.5%) variants.[17, 18, 19] A recent IRP from Haplotype Reference Consortium (HRC)[20] contains even more individuals (N=32 488, mostly with European ancestry) and should therefore enable better imputation of both low-frequency and rare variants in European samples. Recently, several studies have demonstrated that the use of population-specific IRPs can further improve the imputation accuracy of common and low-frequency variants, and improve the imputation of rarer variants in the relevant population.[21, 22, 23, 24] By using an IRP composed of related Dutch individuals, Deelan et al.[23] showed that it is possible to substantially improve the completeness and accuracy of imputation of rare variants into a set of Dutch individuals. Gudbjartsson et al used long-range haplotype phasing in combination with imputation to increase imputation accuracy for rare variants down to MAF of 0.1% in the Icelandic population.[22] Sidore et al. reported several variants associated with circulating lipid levels in Sardinians that were detected due to accurate imputation achieved by using a Sardinian WGS-based IRP; these authors showed that the variants would not have been identified if the analyses had been based on the 1000G IRP.[24] Similar results were obtained in the UK10K project, where the British population-specific IRP combined with 1000G Project reference panel facilitated the discovery of several novel genetic variants associated with medically relevant phenotypes.[19, 25, 26] Studies have shown that the genetic structure of European countries correlates closely with their geographic origin.[27, 28] The Estonian population, being located in Northeast Europe, is genetically most similar to its neighbouring countries, including Finland, the North-western part of Russia, and other Baltic countries.[28, 29, 30] Notwithstanding this overall genetic similarity, the Estonian population still has a substantial proportion of haplotypes that are not expected to be covered by the more diverse IRPs. Moreover, the population-specific differences are expected to increase as allele frequencies decrease. In the current study, we first evaluated two most commonly used phasing algorithms to create population-specific IRP based on high-coverage (30 ×) WGS data from 2244 Estonian individuals. To impute low-frequency and rare variants more accurately in a specific population, one can take two approaches: (i) increase the size of IRPs from diverse populations to capture more reference haplotypes or (ii) employ population-specific IRPs. We assessed the utility of these approaches for improving imputation in Estonian samples by comparing the performance of (i) an Estonian-specific IRP, (ii) the commonly used 1000G IRP, (iii) the much larger HRC IRP and (iv) combinations of these panels.

Materials and methods

Cohort description

2304 geographically distributed individuals (selected randomly by county of birth) from the Estonian Biobank of the Estonian Genome Center, University of Tartu (EGCUT) were selected for WGS. EGCUT is a population-based biobank, containing almost 52 000 samples of the adult population (aged ≥18 years), which closely reflects the age, sex and geographical distribution of the Estonian population. A total of 6394 individuals (selected randomly and not overlapping with WGS data set) from the Estonian Biobank were selected for genotyping using Illumina HumanCoreExome (Illumina, San Diego, CA, USA) array, whereas the subset of 505 of these individuals were also subject to whole-exome sequencing (WES).

WGS and WES sequencing and variant calling

WGS samples followed a PCR-free sample preparation. Libraries sequenced on the Illumina HiSeq X Ten (Illumina, San Diego, CA, USA) with the use of 150 bp paired-end reads to 30 × mean coverage with a median insert size of 400 bp±25%. WES samples DNA was enriched for target sequences (Agilent Technologies, Santa Clara, CA, USA; Human All Exon V5+UTRs) according to manufacturer’s recommendations. Sequenced reads were aligned to the GRCh37/hg19 human reference genome using BWA-MEM[31] v0.7.7. SAMtools[32] v1.2 was applied to compress SAM to BAM (samtools view), sort (samtools sort) and index BAM (samtools index) files. PCR duplicates were then marked using Picard (http://broadinstitute.github.io/picard) v1.136 MarkDuplicates.jar. For further BAM improvements, including realignment around known indels and base quality score recalibration, we applied Genome Analysis Toolkit (GATK)[33, 34] v3.4 (v3.4-46). Single-sample genotypes were called by GATK HaplotypeCaller algorithm (-ERC GVCF). All gVCF-files were combined (-T CombineGVCFs) and jointly called (-T GenotypeGVCFs).

Quality control

Out of the total 2304 WGS samples submitted for sequencing, 4 samples did not have enough input DNA (<1.2 μg), 7 samples failed in library preparation three times and 9 samples had a contamination rate >10%. Thus, variants of 2284 WGS samples were jointly called. The GATK Variant Quality Score Recalibration was used to filter variants with a truth sensitivity of 99.8%. Also, variants with GATK inbreeding coefficient less than −0.3 were filtered to remove sites with excess heterozygous individuals. Only PASS sites were considered in the further analysis. The PLINK/SEQ (https://atgu.mgh.harvard.edu/plinkseq) v0.10 i-stats module was used to calculate number of variants (NVAR), number of non-reference (NALT) variants, number of heterozygous (NHET) variants, NHET/NALT ratio, transition/transversion (TITV) ratio per sample and outlier (below or above 3 SD from the population mean) samples were removed. In addition, genotype and phenotype sex concordance was checked for each sample and outliers were removed. The final WGS sample set contained 2244 individuals. The final WES sample set, which passed all quality control filters and was genotyped with Illumina HumanCoreExome array, contained 505 individuals. Multi-allelic SNVs were removed and we further excluded variants with call rate <0.95, minor allele count ≤2, Hardy–Weinberg equilibrium test P-value<1 × 10−6 and variants in low-complexity regions.[35] Genotype array data was filtered sample-wise by excluding on the basis of call rate (<98%), extreme heterozygosity (>mean±3 SD), genotype and phenotype sex discordance, cryptic relatedness (IBD>20%) and outliers from the European descent from the MDS plot in comparison with HapMap reference samples. SNP quality filtering included call rate (<99%), MAF (<1%) and extreme deviation from Hardy–Weinberg equilibrium (P-value<1 × 10−4). Non-autosomal SNPs were excluded from the analysis.

Haplotype phasing

The EGCUT WGS data was phased with SHAPEIT2[(ref. 36)] (r837), using four computer cores. Pre-phasing of genotype array data was made in similar manner using SHAPEIT2 using four cores. As a separate test for pre-phasing accuracy, we used chromosome 20 sequence of 2244 full genomes, which were filtered beforehand to exclude any non-founder family members and individuals with a genome-wide PI_HAT value above 0.5 (2195 individuals remained) when compared to other individuals in the data set. To assess the efficiency of various approaches to phasing of WGS data, we applied two different tools: SHAPEIT2[(ref. 36)] and Eagle2.[37, 38] Both programs were engaged with the default parameters with varying number of cores (1, 2, 4, 8, 16, 24 and 32). To verify the phasing accuracy for other data sets, the 1000G data was phased using a similar pipeline (1, 8 and 32 cores). In addition to the regular phasing functionality, the read-aware phasing capability of SHAPEIT2[(ref. 39)] was also assessed. The first step entailed creating a phase informative read file on the basis of BAM files, using the module ExtractPIRs v1 (r68) with default parameters provided by the authors. After the generation of phase informative reads, the obtained file could be used in a similar fashion to a map file as a reference point for SHAPEIT2 to phase the data sets. Phasing was performed in three parallel runs after which the average run time and accuracy were compared as indicators of phasing quality. Phasing accuracy was defined as the number of switch errors present in the phased data set. For this, the phased founder genotypes were compared with the non-phased genotypes of their offspring to determine the heredity pattern of heterozygous positions, any shifts in heredity from one parental haplotype to another were counted as switches. Two families with one offspring and two families with two offspring were used to estimate switch error rate in EGCUT sample set, four families with one offspring were used for 1000G sample set. The ratio of switch errors was calculated by dividing the number of haplotype switches to the number of the heterozygous positions where the occurrence of the switch can be reliably determined, after which the results were averaged across the trios.

Genotype imputation

Imputation using EGCUT and 1000G reference panels separately and in combination were performed in High Performance Computing Center, University of Tartu using IMPUTE2 with default parameters. As IMPUTE2 allows to use two-phased reference panels in combination (the ‘imputation with two phased reference panels’ option), we used the EGCUT and 1000G reference panels also together (EGCUT+1000G and 1000G+EGCUT). In case of such panel combining, IMPUTE2 imputes only genotypes for variants that are present in the first (main) panel but in the process, uses additional haplotype information from the second panel to improve the imputation accuracy through larger set of reference haplotypes.[40] Imputation with the HRC panel was carried out using IMPUTE2 with default parameters except that the k_hap parameter that was set to 1000. For all imputation panels, monomorphic SNVs were excluded. No further filtering was performed based on IMPUTE2 info score, but most of the analyses rest on well-imputed (INFO>0.4) and confidently imputed (INFO>0.8) SNVs.

Post-imputation filtering and concordance analyses

The GATK GenotypeConcordance tool was used to calculate imputation accuracy (concordance, non-reference sensitivity and non-reference discordancy) for different imputation panels with WES data for overlapping individuals (N=505) used as the gold standard. Low-complexity regions were filtered out of WES data prior to analysis. PLINK v1.9 was used to convert IMPUTE2 files (imputation output) to VCF format using hard-call threshold 0.9. BCFtools filter option was used to keep genotypes imputed with INFO-value>0.4 and overlapping with WES-target regions. Comparison was performed in three MAF bins (MAF≥5%, 0.5≤MAF<5% and MAF<0.5%) based on WES minor allele frequencies and only well-imputed (INFO>0.4) SNVs were considered. Reference sequence in the concordance analyses was the same for both WGS and WES analysis pipelines. To assess more stratified imputation accuracy, an additional concordance analysis was run for IRPs for well-imputed (INFO>0.4) variants in WES-based MAF bins of (0, 0.2), (0.2, 0.4), (0.4, 0.6), (0.6, 0.8), (0.8, 1), (1, 2), (2, 3), (3, 4), (4, 5), (5, 10), (10, 20), (20, 30), (30, 40) and (40, 50%).

Functional annotation of variants

We used Variant Effect Predictor[41] version 84 to annotate the confidently imputed variants in the 20 345 protein-coding genes in the Ensembl database (with Gencode v19 on GRCh37).

Results

Using high-coverage WGS data of 2244 Estonian individuals from the Estonian Biobank,[42] we created a population-specific IRP. After variant calling and rigorous quality control steps (Materials and Methods), we phased the Estonian WGS data and used the resulting Estonian IRP (referred to here as the EGCUT IRP, for the Estonian Genome Center at University of Tartu), together with the 1000G and HRC IRP, to impute genotypes into 6394 Estonians who had been genotyped on microarrays.

Phasing speed and accuracy of multi-threaded haplotype phasing

Haplotype phasing can be a time-consuming process, especially for large WGS-based data sets. We therefore began by evaluating haplotype-phasing algorithms. We compared three different parallel, multi-threaded computational programmes—SHAPEIT2,[36] SHAPEIT2-RA (for read-aware)[39] and Eagle2(refs. —utilised with different number of processor cores (1, 2, 4, 8, 16, 24 and 32) (Supplementary Figure 1A). These programmes were applied to data from chromosome 20 in the EGCUT samples. Accuracy was assessed by counting the number of haplotype switch errors (Materials and Methods) in four families, for which haplotype phase could be independently determined based on segregation of genetic markers. While the speed of both SHAPEIT2 and SHAPEIT2-RA increased in proportion to the number of cores used, the speed of Eagle2 increased proportionally up to eight cores but not beyond. Up to this point, Eagle2 was considerably faster than SHAPEIT2, by a factor of roughly 6-fold. The two versions of SHAPEIT2 showed similar accuracy, which was slightly lower for Eagle2 (average haplotype switch error rate of 0.7% with SHAPEIT2 vs 0.81% with Eagle2; Table 1). In all cases, the accuracy did not vary significantly with the number of cores used. To validate that these results were not population-specific, we performed similar analyses with four 1000G family trios (with 1, 8 and 32 cores) and observed similar switch error rates in the corresponding phasing results (Supplementary Table 1). While in our hands, SHAPEIT2 displayed slightly higher accuracy, it did so at the cost of increased computing time, making Eagle2 a viable option for the researchers who require time-efficient phasing of large data sets. However, because the 1000G and HRC IRPs were phased with SHAPEIT, we used this program computer program to phase the EGCUT data (Materials and Methods).

Table 1

Phasing speed and accuracy to phase chromosome 20 of the EGCUT WGS data

	SHAPEIT2		SHAPEIT2 read-aware		Eagle2
No of cores	% of switch errors (no of errors)	Time (h)	% of switch errors (no of errors)	Timea (h)	% of switch errors (no of errors)	Time (h)
1	0.72 (257)	179	0.70 (246)	293 (169)	0.81 (291)	29
2	0.71 (255)	98	0.71 (248)	216 (92)	0.81 (291)	15
4	0.70 (250)	51	0.70 (247)	174 (50)	0.81 (291)	8
8	0.71 (254)	28	0.71 (248)	150 (26)	0.81 (291)	5
16	0.71 (254)	16	0.70 (245)	139 (15)	0.81 (291)	5
24	0.70 (251)	12	0.71 (249)	136 (12)	0.81 (291)	10
32	0.70 (253)	11	0.69 (244)	135 (11)	0.81 (291)	9

Abbreviations: EGCUT, Estonian Genome Center, University of Tartu; PIR, phase informative read.

Phasing errors (measured as percentage and count of switch errors out of 35 780 haplotype switches) and running times for different number of processor cores (1, 2, 4, 8, 16, 24 and 32)

Total running time, including the extraction of PIRs from the raw sequencing data (BAM files). Haplotype-phasing time (without PIR extraction) is given in parenthesis.

To impute genotypes into 6394 Estonian individuals who had been genotyped on Illumina HumanCoreExome microarrays, we used the IMPUTE2 software[43, 44] together with three separate IRPs and two combinations of IRPs (Table 2). The first IRP consisted of the 2244 whole-genome sequenced EGCUT individuals; these individuals were selected to be geographically distributed across Estonia and did not overlap with the set of genotyped individuals. The other two were 1000G IRP and the HRC IRP from large diverse populations. The IMPUTE2 software also allows to improve imputation accuracy by using two reference panels simultaneously by pooling haplotype information across both IRPs.[40] We used both combinations of the EGCUT and 1000G panels with that option: EGCUT+1000G and 1000G+EGCUT. In such combinations, IMPUTE2 imputes only genotypes for variants that are present in the first (main) IRP while also considering haplotype information from the second IRP to improve the imputation accuracy through larger set of reference haplotypes. Thus, EGCUT+1000G should be viewed as an improvement of the EGCUT reference panel (genotypes observed in the EGCUT panel imputed while considering haplotypes inferred from the EGCUT and 1000G panels) and 1000G+EGCUT should be considered as an improvement of the 1000G panel (genotypes observed in the 1000G panel imputed while considering haplotypes inferred from both panels).

Table 2

Description of compared IRPs

IRP	1000G	HRC	EGCUT	EGCUT + 1000G	1000G + EGCUT
Description	26 cohorts worldwide	20 cohorts of mostly European ancestry	Estonian diversity panel	1+26 cohorts worldwide	26+1 cohorts worldwide
Average sequencing coverage	7.4 ×	4–8 ×	29.8 ×	29.8 ×	7.4 ×
MAC filter	MAC≥1	MAC≥5	MAC≥3	MAC≥1	MAC≥1
No of haplotypes	5008	64976	4488	9496	9496
No of autosomal SNVs	81 027 987	39 235 157	16 536 512	16 536 512	81 027 987

Abbreviations: HRC, Haplotype Reference Consortium; IRP, imputation reference panel; SNV, single-nucleotide variant.

Number of imputed variants

For each IRP, we studied the number of imputed single-nucleotide variants (SNVs) as a function of the imputation confidence estimate—INFO-value—assigned by the IMPUTE2 programme. The INFO-value reflects the information in imputed genotypes relative to the information if only the allele frequency were known.[43, 44] We counted the total number of imputed SNVs, the number of ‘well-imputed’ SNVs (INFO>0.4)[18] and the number of ‘confidently imputed’ SNVs (INFO>0.8). We also counted the number of imputed SNVs found only with each IRP (Figure 1a).

Figure 1

Number of variants imputed from different IRPs. (a) Number of all shared and panel-specific variants in three distinct reference panels imputed with INFO-value >0.4 (in bold) and >0.8 (given in brackets); (b) Total number of imputed SNVs (bars); the number of SNVs imputed with imputation quality score (INFO-value)>0.4 (coloured) and INFO>0.8 (shaded areas).

Although the number of total variants and well-imputed variants obtained with the larger diverse panels (1000G and HCR) exceeded the corresponding numbers for the population-specific panel, the situation was reversed for confidently imputed SNVs with 12.29 M (75% of total number of imputed SNVs), 10.05 M (48%) and 9.44 M (27%) of SNVs being confidently imputed with the EGCUT, HRC and 1000G panel, respectively (Figure 1b). The combined EGCUT+1000G panel showed almost identical results to EGCUT panel alone, whereas the 1000G+EGCUT panel showed considerable increase in the number of confidently imputed SNVs (by considering additional haplotype information from the population-specific IRP) as compared to the 1000G panel alone. These results indicate that using a population-specific IRP increases the number of confidently imputed variants, due to more similar allele frequencies and greater relatedness between the samples and the IRP. More diverse IRPs have a tendency to employ incorrect allele frequency distribution and also to contain divergent haplotypes, which are not present in the samples (eg., African haplotypes carrying variants that are not polymorphic in non-African populations). We next stratified these analyses according to the MAFs of the imputed SNVs, dividing them into three groups: common (MAF≥5%), low-frequency (0.5≤MAF<5%) and rare (MAF<0.5%) SNVs. For common variants, the number of imputed SNVs was very similar across the IRPs (Figure 2). For low-frequency variants, the number of well-imputed SNVs was also very similar, whereas the number of confidently imputed SNVs was larger for the population-specific IRP. For rare variants, the results were even more pronounced, 3.48 M (54% of well-imputed rare variants), 2.54 M (33%) and 1.86 M (15%) SNVs were imputed confidently from the EGCUT, HRC and 1000G panels, respectively (Figure 2b,Supplementary Table 2). Notably, the EGCUT panel outperformed the other panels on rare variants despite the fact that the HRC panel contains the largest number of haplotypes (64 976) and the 1000G panel contains the largest number of variants (81 M SNVs on autosomes).

Figure 2

Number of common (MAF≥5%), low-frequency (0.5≤MAF<5%) and rare (MAF<0.5%) variants imputed from different IRPs. (a) Number of well-imputed SNVs (imputed with imputation confidence INFO>0.4); and (b) number of confidently imputed SNVs (imputed with imputation confidence INFO>0.8).

These results show that imputation confidence (measured as INFO-value) decreases substantially as the allele frequency of the imputed variants declines (Supplementary Figure 2). Despite the fact that the larger and more diverse IRPs contained more variants, they contained fewer matching haplotypes than the population-specific panel. As a result, the HRC and 1000G panels yielded genotypes imputed with lower confidence (INFO-value), especially for rare SNVs (Supplementary Figure 3). For the combinations of reference panels, the EGCUT+1000G showed almost identical results in every aspect compared to EGCUT panel alone, while the 1000G+EGCUT panel showed a slight gain for common and low-frequency variants and a substantial gain for rare variants when compared to 1000G panel alone (Figure 2).

Imputation of loss-of-function and missense variants

Loss-of-function (LoF) variants that disrupt protein-coding genes and missense variants that cause amino acid changes are of particular interest because they are potentially clinically relevant. Considering only confidently imputed SNVs (INFO>0.8), we observed that all three reference panels enabled imputation of a similar number of common LoF and missense variants (Figure 3). However, the number of low-frequency LoF variants was higher with the population-specific IRP and the number of rare LoFs was almost twice as high (417, 439 and 730 LoF SNVs with the 1000G, HRC and EGCUT, respectively; Supplementary Table 3) with the population-specific IRP.

Figure 3

Number of common (MAF≥5%), low-frequency (0.5≤MAF<5%) and rare (MAF<0.5%) LoF (a) and missense (b) variants imputed from different IRPs with INFO-value>0.4 (bars) and INFO-value>0.8 (shaded areas).

Imputation sensitivity and accuracy

Although imputation confidence estimates (such as INFO-values or squared correlations r2)[45, 46] are useful for characterising the overall success of the imputation process, high INFO or r2 values do not guarantee that the corresponding genotypes are inferred correctly. Therefore, it is important to directly assess the accuracy of the imputed genotypes. We compared the ‘best guess’ genotypes imputed from the different reference panels to WES data available for a subset of imputed EGCUT individuals (N=505; Supplementary Figure 1B). Treating these WES-based genotype calls as ‘gold standard’, we calculated two metrics for each imputed data set: (i) sensitivity, defined as the proportion of WES-based non-reference (NR) variant calls that were also obtained through imputation process; and (ii) discordancy rate, defined as the proportion of imputed SNVs that had incorrect genotype call. For well-imputed common SNVs, all of the IRPs gave similarly high sensitivity (88.5–92.4%) (Figure 4a). For low-frequency SNVs, the three panels that included data from the population-specific panel (EGCUT, EGCUT+1000G, and 1000G+EGCUT) yielded in higher sensitivity (~87%) than the more diverse panels (78% and 76% for HRC and 1000G, respectively) (Table 3). For rare SNVs, the proportional difference was even greater (40%, 42% and 49% for 1000G, HRC and EGCUT IRPs, respectively).

Figure 4

Imputation accuracy for common (MAF≥5%), low-frequency (0.5≤MAF<5%) and rare (MAF<0.5%) well-imputed variants (INFO>0.4) imputed from different IRPs. (a) Non-reference (NR) sensitivity—proportion of whole-exome sequencing (WES) based NR variant calls that were also retrieved through imputation process. (b) NR discordancy rate—proportion of NR variants that were retrieved through imputation process but had incorrect genotype calls as compared to the WES genotypes.

Table 3

Genotype concordance of well-imputed SNVs (INFO>0.4)

	Non-reference sensitivity and discordancy rate (number of NR genotypes analysed, in millions)
	MAF 5%		MAF 0.5–5%		MAF<0.5%
Reference panel	Sensitivity	Discordancy rate	Sensitivity	Discordancy rate	Sensitivity	Discordancy rate
1000G	88.5% (24.3)	3.4% (22.0)	75.9% (2.4)	14.0% (2.1)	39.9% (0.7)	24.7% (0.4)
HRC	89.4% (24.1)	2.1% (21.9)	77.8% (2.4)	8.2% (2.0)	41.9% (0.7)	17.0% (0.4)
EGCUT	91.4% (24.3)	1.9% (22.5)	87.2% (2.4)	6.1% (2.2)	48.6% (0.7)	14.1% (0.4)
EGCUT+1000G	91.5% (24.3)	2.1% (22.6)	87.2% (2.4)	6.3% (2.2)	49.0% (0.7)	13.6% (0.4)
1000G+EGCUT	92.4% (24.3)	2.2% (22.8)	87.1% (2.4)	6.5% (2.2)	49.9% (0.7)	14.3% (0.4)

Abbreviations: EGCUT, Estonian Genome Center, University of Tartu; HRC, Haplotype Reference Consortium; IRP, imputation reference panel; MAF, minor allele frequency; NR, non-reference; SNV, single-nucleotide variant; WES, whole-exome sequencing.

The ‘best guess’ genotype calls obtained with different IRPs were compared to the WES data while treating the WES-based genotype calls as ‘gold standard’. Imputation sensitivity—proportion of WES-based non-reference variant calls that were also obtained through imputation process—and discordancy rate (proportion of NR variant calls that were obtained through imputation process but which had incorrect genotype calls) were calculated.

Similarly, the population-specific IRP performed better with respect to discordancy rate (Figure 4b). Whereas all three panels had a low discordancy rate for common variants (1.9–3.4%), the EGCUT panel outperformed other panels for low-frequency and rare SNVs (Table 3). Notably, one-quarter (24.7%) of rare SNVs imputed from the 1000G IRP had incorrect genotype calls, whereas the proportion was substantially lower with the EGCUT IRP alone (14.1%) or if it was used in combination with the 1000G panel (13.6% and 14.3% for the EGCUT+1000G and 1000G+EGCUT panels, respectively). Similar results were seen for confidently imputed variants, for which both sensitivity and discordancy rate were better in case of the population-specific reference panel (Supplementary Figure 4, Supplementary Table 4). The better performance is due to a close match between the EGCUT IRP and Estonian samples—owing to the fact that rare variants tend to be more recent and thus more population specific. We repeated these analyses of imputation accuracy by using finer bins of MAF (Supplementary Figures 5–9). We found that although the overall success of genotype imputation of well-imputed variants decreased steadily with MAF in case of all compared IRPs, imputation accuracy was, especially for rare variants, significantly better in case of the population-specific IRP (Supplementary Figure 7) or if it was used together with the 1000G reference panel (Supplementary Figures 8 and 9).

Discussion

Genotype imputation is a cost-efficient way to improve the power and resolution of GWA studies. Although large IRPs from diverse populations work reasonably well for imputation of common and low-frequency variants, currently available reference panels allow only limited imputation of rare variants. WGS has become increasingly widespread in recent years and is increasingly used in creating IRPs. The first step in the process of creating an IRP is the correct assignment of polymorphic positions regarding the individual haplotypes. Although the task can be computationally demanding for large data sets, the advent of various phasing algorithms has simplified this task considerably. We compared the performance of the SHAPEIT2 and Eagle2 software, both of which can increase the phasing speed by dividing the phased reference data set into multiple subsets, which are then processed in parallel. Similarly to previously published comparison,[38] we found that Eagle2 was considerably faster than SHAPEIT2. However, the decrease in phasing time resulted in a small increase in haplotype switch errors, making SHAPEIT2 a better choice for those aiming at the highest accuracy. Interestingly, we did not observe a difference in phasing accuracy between SHAPEIT2 and SHAPEIT2’s read-aware mode. It is possible that this was due relatively homogeneous nature of our Estonian samples and that the SHAPEIT2 read-aware mode may exhibit advantages for more heterogeneous data sets. Consistent with previous studies, our results show that population-specific IRPs can improve the genotype imputation, especially for low-frequency and rare variants.[21, 22, 23, 24] By being genuinely reflective of the study data set, population-specific IRPs can therefore facilitate discovery of true associations in GWAS and subsequent fine-mapping of causal variants, as demonstrated by others[24, 47, 48] and also with the Estonian population-specific reference panel.[49] Although the large IRPs from more diverse populations led to the imputation of a larger number of rare SNVs, a large proportion of these genotypes were imputed with low imputation confidence (IMPUTE2 INFO-value). Focusing only on confidently imputed SNVs, the population-specific IRP outperformed the 1000G and HRC IRPs. Although the overall imputation success and accuracy depend on several different factors (including the size of the IRP and the genetic structure of the reference panel and the genotyped sample), these observations are expected to apply to other populations with similar genetic background. Beyond imputation quality, we also considered sensitivity and discrepancy rate of the imputed genotypes. We found that the population-specific IRP outperformed the large IRPs from diverse populations—a finding that is also in line with other recent imputation accuracy comparisons.[50] Using a large IRP that is not well matched in terms of ancestry can thus not only limit the discovery of associations in GWAS as observed previously[24] but also introduce variants that are not actually polymorphic in the imputed sample.[50] Because short insertion-deletion (indel) variants were not part of the HRC IRP and because calling indel variants is still more error-prone than SNV calling, we did not include indels in our IRP and our comparisons. Once technical limitations related to indel calling and phasing are resolved, indels should be included in all IRPs. In conclusion, we observe that, although currently publicly accessible large diverse IRPs like 1000G and HRC enable imputation of many low-frequency and rare variants in the Estonian population, most of these variants are imputed with relatively low confidence and furthermore, there is a significant proportion of population-specific variation that cannot be imputed from these panels. Moreover, imputation of low-frequency and rare variants is considerably more accurate with a population-specific reference panel or if one is used in combination with a publicly available reference such as the 1000G panel. Our results also suggest that, given that the population-specific reference panel size (number of haplotypes) is comparable to the 1000G panel size, the previous observations that reference sample size is more important than precise population matching does not apply equally well to all populations and population-specific panels can outperform even an order of magnitude larger but more diverse reference panels.

49 in total

1. Large-scale whole-genome sequencing of the Icelandic population.

Authors: Daniel F Gudbjartsson; Hannes Helgason; Sigurjon A Gudjonsson; Florian Zink; Asmundur Oddson; Arnaldur Gylfason; Soren Besenbacher; Gisli Magnusson; Bjarni V Halldorsson; Eirikur Hjartarson; Gunnar Th Sigurdsson; Simon N Stacey; Michael L Frigge; Hilma Holm; Jona Saemundsdottir; Hafdis Th Helgadottir; Hrefna Johannsdottir; Gunnlaugur Sigfusson; Gudmundur Thorgeirsson; Jon Th Sverrisson; Solveig Gretarsdottir; G Bragi Walters; Thorunn Rafnar; Bjarni Thjodleifsson; Einar S Bjornsson; Sigurdur Olafsson; Hildur Thorarinsdottir; Thora Steingrimsdottir; Thora S Gudmundsdottir; Asgeir Theodors; Jon G Jonasson; Asgeir Sigurdsson; Gyda Bjornsdottir; Jon J Jonsson; Olafur Thorarensen; Petur Ludvigsson; Hakon Gudbjartsson; Gudmundur I Eyjolfsson; Olof Sigurdardottir; Isleifur Olafsson; David O Arnar; Olafur Th Magnusson; Augustine Kong; Gisli Masson; Unnur Thorsteinsdottir; Agnar Helgason; Patrick Sulem; Kari Stefansson
Journal: Nat Genet Date: 2015-03-25 Impact factor: 38.330

2. A second generation human haplotype map of over 3.1 million SNPs.

Authors: Kelly A Frazer; Dennis G Ballinger; David R Cox; David A Hinds; Laura L Stuve; Richard A Gibbs; John W Belmont; Andrew Boudreau; Paul Hardenbol; Suzanne M Leal; Shiran Pasternak; David A Wheeler; Thomas D Willis; Fuli Yu; Huanming Yang; Changqing Zeng; Yang Gao; Haoran Hu; Weitao Hu; Chaohua Li; Wei Lin; Siqi Liu; Hao Pan; Xiaoli Tang; Jian Wang; Wei Wang; Jun Yu; Bo Zhang; Qingrun Zhang; Hongbin Zhao; Hui Zhao; Jun Zhou; Stacey B Gabriel; Rachel Barry; Brendan Blumenstiel; Amy Camargo; Matthew Defelice; Maura Faggart; Mary Goyette; Supriya Gupta; Jamie Moore; Huy Nguyen; Robert C Onofrio; Melissa Parkin; Jessica Roy; Erich Stahl; Ellen Winchester; Liuda Ziaugra; David Altshuler; Yan Shen; Zhijian Yao; Wei Huang; Xun Chu; Yungang He; Li Jin; Yangfan Liu; Yayun Shen; Weiwei Sun; Haifeng Wang; Yi Wang; Ying Wang; Xiaoyan Xiong; Liang Xu; Mary M Y Waye; Stephen K W Tsui; Hong Xue; J Tze-Fei Wong; Luana M Galver; Jian-Bing Fan; Kevin Gunderson; Sarah S Murray; Arnold R Oliphant; Mark S Chee; Alexandre Montpetit; Fanny Chagnon; Vincent Ferretti; Martin Leboeuf; Jean-François Olivier; Michael S Phillips; Stéphanie Roumy; Clémentine Sallée; Andrei Verner; Thomas J Hudson; Pui-Yan Kwok; Dongmei Cai; Daniel C Koboldt; Raymond D Miller; Ludmila Pawlikowska; Patricia Taillon-Miller; Ming Xiao; Lap-Chee Tsui; William Mak; You Qiang Song; Paul K H Tam; Yusuke Nakamura; Takahisa Kawaguchi; Takuya Kitamoto; Takashi Morizono; Atsushi Nagashima; Yozo Ohnishi; Akihiro Sekine; Toshihiro Tanaka; Tatsuhiko Tsunoda; Panos Deloukas; Christine P Bird; Marcos Delgado; Emmanouil T Dermitzakis; Rhian Gwilliam; Sarah Hunt; Jonathan Morrison; Don Powell; Barbara E Stranger; Pamela Whittaker; David R Bentley; Mark J Daly; Paul I W de Bakker; Jeff Barrett; Yves R Chretien; Julian Maller; Steve McCarroll; Nick Patterson; Itsik Pe'er; Alkes Price; Shaun Purcell; Daniel J Richter; Pardis Sabeti; Richa Saxena; Stephen F Schaffner; Pak C Sham; Patrick Varilly; David Altshuler; Lincoln D Stein; Lalitha Krishnan; Albert Vernon Smith; Marcela K Tello-Ruiz; Gudmundur A Thorisson; Aravinda Chakravarti; Peter E Chen; David J Cutler; Carl S Kashuk; Shin Lin; Gonçalo R Abecasis; Weihua Guan; Yun Li; Heather M Munro; Zhaohui Steve Qin; Daryl J Thomas; Gilean McVean; Adam Auton; Leonardo Bottolo; Niall Cardin; Susana Eyheramendy; Colin Freeman; Jonathan Marchini; Simon Myers; Chris Spencer; Matthew Stephens; Peter Donnelly; Lon R Cardon; Geraldine Clarke; David M Evans; Andrew P Morris; Bruce S Weir; Tatsuhiko Tsunoda; James C Mullikin; Stephen T Sherry; Michael Feolo; Andrew Skol; Houcan Zhang; Changqing Zeng; Hui Zhao; Ichiro Matsuda; Yoshimitsu Fukushima; Darryl R Macer; Eiko Suda; Charles N Rotimi; Clement A Adebamowo; Ike Ajayi; Toyin Aniagwu; Patricia A Marshall; Chibuzor Nkwodimmah; Charmaine D M Royal; Mark F Leppert; Missy Dixon; Andy Peiffer; Renzong Qiu; Alastair Kent; Kazuto Kato; Norio Niikawa; Isaac F Adewole; Bartha M Knoppers; Morris W Foster; Ellen Wright Clayton; Jessica Watkin; Richard A Gibbs; John W Belmont; Donna Muzny; Lynne Nazareth; Erica Sodergren; George M Weinstock; David A Wheeler; Imtaz Yakub; Stacey B Gabriel; Robert C Onofrio; Daniel J Richter; Liuda Ziaugra; Bruce W Birren; Mark J Daly; David Altshuler; Richard K Wilson; Lucinda L Fulton; Jane Rogers; John Burton; Nigel P Carter; Christopher M Clee; Mark Griffiths; Matthew C Jones; Kirsten McLay; Robert W Plumb; Mark T Ross; Sarah K Sims; David L Willey; Zhu Chen; Hua Han; Le Kang; Martin Godbout; John C Wallenburg; Paul L'Archevêque; Guy Bellemare; Koji Saeki; Hongguang Wang; Daochang An; Hongbo Fu; Qing Li; Zhen Wang; Renwu Wang; Arthur L Holden; Lisa D Brooks; Jean E McEwen; Mark S Guyer; Vivian Ota Wang; Jane L Peterson; Michael Shi; Jack Spiegel; Lawrence M Sung; Lynn F Zacharia; Francis S Collins; Karen Kennedy; Ruth Jamieson; John Stewart
Journal: Nature Date: 2007-10-18 Impact factor: 49.962

3. Meta-analysis of Genome-wide Association Studies for Neuroticism, and the Polygenic Association With Major Depressive Disorder.

Authors: Marleen H M de Moor; Stéphanie M van den Berg; Karin J H Verweij; Robert F Krueger; Michelle Luciano; Alejandro Arias Vasquez; Lindsay K Matteson; Jaime Derringer; Tõnu Esko; Najaf Amin; Scott D Gordon; Narelle K Hansell; Amy B Hart; Ilkka Seppälä; Jennifer E Huffman; Bettina Konte; Jari Lahti; Minyoung Lee; Mike Miller; Teresa Nutile; Toshiko Tanaka; Alexander Teumer; Alexander Viktorin; Juho Wedenoja; Goncalo R Abecasis; Daniel E Adkins; Arpana Agrawal; Jüri Allik; Katja Appel; Timothy B Bigdeli; Fabio Busonero; Harry Campbell; Paul T Costa; George Davey Smith; Gail Davies; Harriet de Wit; Jun Ding; Barbara E Engelhardt; Johan G Eriksson; Iryna O Fedko; Luigi Ferrucci; Barbara Franke; Ina Giegling; Richard Grucza; Annette M Hartmann; Andrew C Heath; Kati Heinonen; Anjali K Henders; Georg Homuth; Jouke-Jan Hottenga; William G Iacono; Joost Janzing; Markus Jokela; Robert Karlsson; John P Kemp; Matthew G Kirkpatrick; Antti Latvala; Terho Lehtimäki; David C Liewald; Pamela A F Madden; Chiara Magri; Patrik K E Magnusson; Jonathan Marten; Andrea Maschio; Sarah E Medland; Evelin Mihailov; Yuri Milaneschi; Grant W Montgomery; Matthias Nauck; Klaasjan G Ouwens; Aarno Palotie; Erik Pettersson; Ozren Polasek; Yong Qian; Laura Pulkki-Råback; Olli T Raitakari; Anu Realo; Richard J Rose; Daniela Ruggiero; Carsten O Schmidt; Wendy S Slutske; Rossella Sorice; John M Starr; Beate St Pourcain; Angelina R Sutin; Nicholas J Timpson; Holly Trochet; Sita Vermeulen; Eero Vuoksimaa; Elisabeth Widen; Jasper Wouda; Margaret J Wright; Lina Zgaga; David Porteous; Alessandra Minelli; Abraham A Palmer; Dan Rujescu; Marina Ciullo; Caroline Hayward; Igor Rudan; Andres Metspalu; Jaakko Kaprio; Ian J Deary; Katri Räikkönen; James F Wilson; Liisa Keltikangas-Järvinen; Laura J Bierut; John M Hettema; Hans J Grabe; Cornelia M van Duijn; David M Evans; David Schlessinger; Nancy L Pedersen; Antonio Terracciano; Matt McGue; Brenda W J H Penninx; Nicholas G Martin; Dorret I Boomsma
Journal: JAMA Psychiatry Date: 2015-07 Impact factor: 21.596

4. Genotype imputation with thousands of genomes.

Authors: Bryan Howie; Jonathan Marchini; Matthew Stephens
Journal: G3 (Bethesda) Date: 2011-11-01 Impact factor: 3.154

5. Genome of The Netherlands population-specific imputations identify an ABCA6 variant associated with cholesterol levels.

Authors: Elisabeth M van Leeuwen; Lennart C Karssen; Joris Deelen; Aaron Isaacs; Carolina Medina-Gomez; Hamdi Mbarek; Alexandros Kanterakis; Stella Trompet; Iris Postmus; Niek Verweij; David J van Enckevort; Jennifer E Huffman; Charles C White; Mary F Feitosa; Traci M Bartz; Ani Manichaikul; Peter K Joshi; Gina M Peloso; Patrick Deelen; Freerk van Dijk; Gonneke Willemsen; Eco J de Geus; Yuri Milaneschi; Brenda W J H Penninx; Laurent C Francioli; Androniki Menelaou; Sara L Pulit; Fernando Rivadeneira; Albert Hofman; Ben A Oostra; Oscar H Franco; Irene Mateo Leach; Marian Beekman; Anton J M de Craen; Hae-Won Uh; Holly Trochet; Lynne J Hocking; David J Porteous; Naveed Sattar; Chris J Packard; Brendan M Buckley; Jennifer A Brody; Joshua C Bis; Jerome I Rotter; Josyf C Mychaleckyj; Harry Campbell; Qing Duan; Leslie A Lange; James F Wilson; Caroline Hayward; Ozren Polasek; Veronique Vitart; Igor Rudan; Alan F Wright; Stephen S Rich; Bruce M Psaty; Ingrid B Borecki; Patricia M Kearney; David J Stott; L Adrienne Cupples; J Wouter Jukema; Pim van der Harst; Eric J Sijbrands; Jouke-Jan Hottenga; Andre G Uitterlinden; Morris A Swertz; Gert-Jan B van Ommen; Paul I W de Bakker; P Eline Slagboom; Dorret I Boomsma; Cisca Wijmenga; Cornelia M van Duijn
Journal: Nat Commun Date: 2015-03-09 Impact factor: 14.919

6. Fast and accurate long-range phasing in a UK Biobank cohort.

Authors: Po-Ru Loh; Pier Francesco Palamara; Alkes L Price
Journal: Nat Genet Date: 2016-06-06 Impact factor: 38.330

7. A global reference for human genetic variation.

Authors: Adam Auton; Lisa D Brooks; Richard M Durbin; Erik P Garrison; Hyun Min Kang; Jan O Korbel; Jonathan L Marchini; Shane McCarthy; Gil A McVean; Gonçalo R Abecasis
Journal: Nature Date: 2015-10-01 Impact factor: 49.962

8. Biological insights from 108 schizophrenia-associated genetic loci.

Authors:
Journal: Nature Date: 2014-07-22 Impact factor: 49.962

9. The UK10K project identifies rare variants in health and disease.

Authors: Klaudia Walter; Josine L Min; Jie Huang; Lucy Crooks; Yasin Memari; Shane McCarthy; John R B Perry; ChangJiang Xu; Marta Futema; Daniel Lawson; Valentina Iotchkova; Stephan Schiffels; Audrey E Hendricks; Petr Danecek; Rui Li; James Floyd; Louise V Wain; Inês Barroso; Steve E Humphries; Matthew E Hurles; Eleftheria Zeggini; Jeffrey C Barrett; Vincent Plagnol; J Brent Richards; Celia M T Greenwood; Nicholas J Timpson; Richard Durbin; Nicole Soranzo
Journal: Nature Date: 2015-09-14 Impact factor: 49.962

10. Genome sequencing elucidates Sardinian genetic architecture and augments association analyses for lipid and blood inflammatory markers.

Authors: Carlo Sidore; Fabio Busonero; Andrea Maschio; Eleonora Porcu; Silvia Naitza; Magdalena Zoledziewska; Antonella Mulas; Giorgio Pistis; Maristella Steri; Fabrice Danjou; Alan Kwong; Vicente Diego Ortega Del Vecchyo; Charleston W K Chiang; Jennifer Bragg-Gresham; Maristella Pitzalis; Ramaiah Nagaraja; Brendan Tarrier; Christine Brennan; Sergio Uzzau; Christian Fuchsberger; Rossano Atzeni; Frederic Reinier; Riccardo Berutti; Jie Huang; Nicholas J Timpson; Daniela Toniolo; Paolo Gasparini; Giovanni Malerba; George Dedoussis; Eleftheria Zeggini; Nicole Soranzo; Chris Jones; Robert Lyons; Andrea Angius; Hyun M Kang; John Novembre; Serena Sanna; David Schlessinger; Francesco Cucca; Gonçalo R Abecasis
Journal: Nat Genet Date: 2015-09-14 Impact factor: 38.330

63 in total

1. Kinpute: using identity by descent to improve genotype imputation.

Authors: Mark Abney; Aisha ElSherbiny
Journal: Bioinformatics Date: 2019-11-01 Impact factor: 6.937

2. Genotype imputation performance of three reference panels using African ancestry individuals.

Authors: Candelaria Vergara; Margaret M Parker; Liliana Franco; Michael H Cho; Ana V Valencia-Duarte; Terri H Beaty; Priya Duggal
Journal: Hum Genet Date: 2018-04-10 Impact factor: 4.132

3. High-depth whole genome sequencing of an Ashkenazi Jewish reference panel: enhancing sensitivity, accuracy, and imputation.

Authors: Todd Lencz; Jin Yu; Cameron Palmer; Shai Carmi; Danny Ben-Avraham; Nir Barzilai; Susan Bressman; Ariel Darvasi; Judy H Cho; Lorraine N Clark; Zeynep H Gümüş; Vijai Joseph; Robert Klein; Steven Lipkin; Kenneth Offit; Harry Ostrer; Laurie J Ozelius; Inga Peter; Gil Atzmon; Itsik Pe'er
Journal: Hum Genet Date: 2018-04-28 Impact factor: 4.132

4. The Arrival of Siberian Ancestry Connecting the Eastern Baltic to Uralic Speakers further East.

Authors: Lehti Saag; Margot Laneman; Liivi Varul; Martin Malve; Heiki Valk; Maria A Razzak; Ivan G Shirobokov; Valeri I Khartanovich; Elena R Mikhaylova; Alena Kushniarevich; Christiana Lyn Scheib; Anu Solnik; Tuuli Reisberg; Jüri Parik; Lauri Saag; Ene Metspalu; Siiri Rootsi; Francesco Montinaro; Maido Remm; Reedik Mägi; Eugenia D'Atanasio; Enrico Ryunosuke Crema; David Díez-Del-Molino; Mark G Thomas; Aivar Kriiska; Toomas Kivisild; Richard Villems; Valter Lang; Mait Metspalu; Kristiina Tambets
Journal: Curr Biol Date: 2019-05-09 Impact factor: 10.834

5. Improving power of association tests using multiple sets of imputed genotypes from distributed reference panels.

Authors: Wei Zhou; Lars G Fritsche; Sayantan Das; He Zhang; Jonas B Nielsen; Oddgeir L Holmen; Jin Chen; Maoxuan Lin; Maiken B Elvestad; Kristian Hveem; Goncalo R Abecasis; Hyun Min Kang; Cristen J Willer
Journal: Genet Epidemiol Date: 2017-09-01 Impact factor: 2.135

6. PGG.Han: the Han Chinese genome database and analysis platform.

Authors: Yang Gao; Chao Zhang; Liyun Yuan; YunChao Ling; Xiaoji Wang; Chang Liu; Yuwen Pan; Xiaoxi Zhang; Xixian Ma; Yuchen Wang; Yan Lu; Kai Yuan; Wei Ye; Jiaqiang Qian; Huidan Chang; Ruifang Cao; Xiao Yang; Ling Ma; Yuanhu Ju; Long Dai; Yuanyuan Tang; Guoqing Zhang; Shuhua Xu
Journal: Nucleic Acids Res Date: 2020-01-08 Impact factor: 16.971

Review 7. Benefits and limitations of genome-wide association studies.

Authors: Vivian Tam; Nikunj Patel; Michelle Turcotte; Yohan Bossé; Guillaume Paré; David Meyre
Journal: Nat Rev Genet Date: 2019-08 Impact factor: 53.242

Review 8. Electronic health records and polygenic risk scores for predicting disease risk.

Authors: Ruowang Li; Yong Chen; Marylyn D Ritchie; Jason H Moore
Journal: Nat Rev Genet Date: 2020-03-31 Impact factor: 53.242

9. Evaluating the cardiovascular safety of sclerostin inhibition using evidence from meta-analysis of clinical trials and human genetics.

Authors: Jonas Bovijn; Kristi Krebs; Chia-Yen Chen; Ruth Boxall; Jenny C Censin; Teresa Ferreira; Sara L Pulit; Craig A Glastonbury; Samantha Laber; Iona Y Millwood; Kuang Lin; Liming Li; Zhengming Chen; Lili Milani; George Davey Smith; Robin G Walters; Reedik Mägi; Benjamin M Neale; Cecilia M Lindgren; Michael V Holmes
Journal: Sci Transl Med Date: 2020-06-24 Impact factor: 17.956

10. An expanded analysis framework for multivariate GWAS connects inflammatory biomarkers to functional variants and disease.

Authors: Sanni E Ruotsalainen; Juulia J Partanen; Anna Cichonska; Jake Lin; Christian Benner; Ida Surakka; Mary Pat Reeve; Priit Palta; Marko Salmi; Sirpa Jalkanen; Ari Ahola-Olli; Aarno Palotie; Veikko Salomaa; Mark J Daly; Matti Pirinen; Samuli Ripatti; Jukka Koskela
Journal: Eur J Hum Genet Date: 2020-10-27 Impact factor: 4.246