Adam E Locke1,2,3, Karyn Meltz Steinberg2,4, Charleston W K Chiang5,6,7, Susan K Service5, Aki S Havulinna8,9, Laurel Stell10, Matti Pirinen8,11,12, Haley J Abel2,13, Colby C Chiang2, Robert S Fulton2,13, Anne U Jackson3, Chul Joo Kang2, Krishna L Kanchi2, Daniel C Koboldt2,14,15, David E Larson2,13, Joanne Nelson2, Thomas J Nicholas2,16, Arto Pietilä9, Vasily Ramensky5,17, Debashree Ray3,18, Laura J Scott3, Heather M Stringham3, Jagadish Vangipurapu19, Ryan Welch3, Pranav Yajnik3, Xianyong Yin3, Johan G Eriksson20,21,22, Mika Ala-Korpela23,24,25,26,27,28, Marjo-Riitta Järvelin29,30,31,32,33, Minna Männikkö30,34, Hannele Laivuori8,35,36, Susan K Dutcher2,13, Nathan O Stitziel2,37, Richard K Wilson2,14,15, Ira M Hall1,2, Chiara Sabatti10,38, Aarno Palotie8,39,40, Veikko Salomaa9, Markku Laakso19,41, Samuli Ripatti8,11,40, Michael Boehnke42, Nelson B Freimer43. 1. Department of Medicine, Washington University School of Medicine, St Louis, MO, USA. 2. McDonnell Genome Institute, Washington University School of Medicine, St Louis, MO, USA. 3. Department of Biostatistics and Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA. 4. Department of Pediatrics, Washington University School of Medicine, St Louis, MO, USA. 5. Center for Neurobehavioral Genetics, Jane and Terry Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA, USA. 6. Center for Genetic Epidemiology, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA. 7. Quantitative and Computational Biology Section, Department of Biological Sciences, University of Southern California, Los Angeles, CA, USA. 8. Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland. 9. National Institute for Health and Welfare, Helsinki, Finland. 10. Department of Biomedical Data Science, Stanford University, Stanford, CA, USA. 11. Department of Public Health, University of Helsinki, Helsinki, Finland. 12. Helsinki Institute for Information Technology HIIT and Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland. 13. Department of Genetics, Washington University School of Medicine, St Louis, MO, USA. 14. The Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, USA. 15. Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, USA. 16. USTAR Center for Genetic Discovery and Department of Human Genetics, University of Utah, Salt Lake City, UT, USA. 17. Federal State Institution "National Medical Research Center for Preventive Medicine" of the Ministry of Healthcare of the Russian Federation, Moscow, Russia. 18. Departments of Epidemiology and Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA. 19. Institute of Clinical Medicine, Internal Medicine, University of Eastern Finland, Kuopio, Finland. 20. Department of Public Health Solutions, National Institute for Health and Welfare, Helsinki, Finland. 21. Folkhälsan Research Center, Helsinki, Finland. 22. Department of General Practice and Primary Health Care, University of Helsinki, Helsinki and Helsinki University Hospital, Helsinki, Finland. 23. Systems Epidemiology, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia. 24. Computational Medicine, Faculty of Medicine, University of Oulu and Biocenter Oulu, University of Oulu, Oulu, Finland. 25. NMR Metabolomics Laboratory, School of Pharmacy, University of Eastern Finland, Kuopio, Finland. 26. Population Health Science, Bristol Medical School, University of Bristol, Bristol, UK. 27. Medical Research Council Integrative Epidemiology Unit at the University of Bristol, Bristol, UK. 28. Department of Epidemiology and Preventive Medicine, School of Public Health and Preventive Medicine, Faculty of Medicine, Nursing and Health Sciences, The Alfred Hospital, Monash University, Melbourne, Victoria, Australia. 29. Biocenter Oulu, University of Oulu, Oulu, Finland. 30. Center for Life Course Health Research, Faculty of Medicine, University of Oulu, Oulu, Finland. 31. Unit of Primary Health Care, Oulu University Hospital, Oulu, Finland. 32. Department of Epidemiology and Biostatistics, MRC-PHE Centre for Environment and Health, School of Public Health, Imperial College London, London, UK. 33. Department of Life Sciences, College of Health and Life Sciences, Brunel University London, London, UK. 34. Northern Finland Birth Cohorts, Faculty of Medicine, University of Oulu, Oulu, Finland. 35. Medical and Clinical Genetics, University of Helsinki and Helsinki University Hospital, Helsinki, Finland. 36. Department of Obstetrics and Gynecology, Tampere University Hospital and University of Tampere, Faculty of Medicine and Health Technology, Tampere, Finland. 37. Cardiovascular Division, Department of Medicine, Washington University School of Medicine, St Louis, MO, USA. 38. Department of Statistics, Stanford University, Stanford, CA, USA. 39. Analytical and Translational Genetics Unit (ATGU), Psychiatric & Neurodevelopmental Genetics Unit, Departments of Psychiatry and Neurology, Massachusetts General Hospital, Boston, MA, USA. 40. Broad Institute of MIT and Harvard, Cambridge, MA, USA. 41. Department of Medicine, Kuopio University Hospital, Kuopio, Finland. 42. Department of Biostatistics and Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA. boehnke@umich.edu. 43. Center for Neurobehavioral Genetics, Jane and Terry Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA, USA. nfreimer@mednet.ucla.edu.
Abstract
Exome-sequencing studies have generally been underpowered to identify deleterious alleles with a large effect on complex traits as such alleles are mostly rare. Because the population of northern and eastern Finland has expanded considerably and in isolation following a series of bottlenecks, individuals of these populations have numerous deleterious alleles at a relatively high frequency. Here, using exome sequencing of nearly 20,000 individuals from these regions, we investigate the role of rare coding variants in clinically relevant quantitative cardiometabolic traits. Exome-wide association studies for 64 quantitative traits identified 26 newly associated deleterious alleles. Of these 26 alleles, 19 are either unique to or more than 20 times more frequent in Finnish individuals than in other Europeans and show geographical clustering comparable to Mendelian disease mutations that are characteristic of the Finnish population. We estimate that sequencing studies of populations without this unique history would require hundreds of thousands to millions of participants to achieve comparable association power.
Exome-sequencing studies have generally been underpowered to identify deleterious alleles with a large effect on complex traits as such alleles are mostly rare. Because the population of northern and eastern Finland has expanded considerably and in isolation following a series of bottlenecks, individuals of these populations have numerous deleterious alleles at a relatively high frequency. Here, using exome sequencing of nearly 20,000 individuals from these regions, we investigate the role of rare coding variants in clinically relevant quantitative cardiometabolic traits. Exome-wide association studies for 64 quantitative traits identified 26 newly associated deleterious alleles. Of these 26 alleles, 19 are either unique to or more than 20 times more frequent in Finnish individuals than in other Europeans and show geographical clustering comparable to Mendelian disease mutations that are characteristic of the Finnish population. We estimate that sequencing studies of populations without this unique history would require hundreds of thousands to millions of participants to achieve comparable association power.
Most alleles with a demonstrated deleterious effect on phenotypes directly alter protein structure or function[1,2]. Exome sequencing studies aim to discover such alleles and demonstrate their association to common diseases and disease-related quantitative traits. However, exome sequencing studies to date generally have identified few newly associated rare variants or genes[3,4]. The sample size required for such discoveries remains uncertain and theoretical analyses indicate that studies to date have been underpowered, since most deleterious variants are expected to be rare due to purifying selection[5]. These previous analyses also suggest that power to detect associations to deleterious alleles is greatest in populations that have expanded in isolation after recent bottlenecks, as alleles passing through the bottlenecks may rise to much higher frequencies than in other populations[6-8].Finland exemplifies such a history. Bottlenecks occurred at the founding of early-settlement regions (southern and western Finland) 2,000-4,000 years ago and again with internal migration to late-settlement regions (northern and eastern Finland) in the 15th and 16th centuries[9]. Finland’s subsequent population growth (to ~5.5 million) generated sizable geographic sub-isolates in late-settlement regions.This unique population history has resulted in “the Finnish Disease Heritage”[10], 36 Mendelian diseases that are much more common in Finns than in other Europeans. These disorders concentrate in late-settlement regions of Finland[10], and the genes responsible for them exhibit extreme enrichment of deleterious variants[11-13]. We created the FinMetSeq study to capitalize on the population history of late-settlement Finland to discover rare-variant associations with cardiovascular and metabolic disease-relevant quantitative traits through exome sequencing of two extensively phenotyped population cohorts, FINRISK and METSIM (Methods).We successfully sequenced 19,292 FinMetSeq participants and tested the identified variants for association with 64 clinically relevant quantitative traits, discovering 43 novel associations with deleterious variants[14,15]: 19 associations (11 traits) in FinMetSeq alone and 24 associations (20 traits) in a combined analysis of FinMetSeq with 24,776 Finns from three cohorts with imputed genome-wide genotypes. Nineteen of the 26 variants underlying these 43 associations were unique to Finland or enriched >20-fold in FinMetSeq compared to non-Finnish Europeans (NFE). These enriched alleles cluster geographically like Finnish Disease Heritage mutations, indicating that the distribution of trait-associated rare alleles may vary significantly between locations within a country.We demonstrate that exome sequencing in a historically isolated population that expanded after recent population bottlenecks is an extraordinarily efficient strategy to discover alleles with a substantial effect on quantitative traits. As most of the novel, putatively deleterious trait-associated variants that we identified are unique to or highly enriched in Finland, we estimate that similarly powered studies of these variants in non-Finnish populations might require hundreds of thousands or millions of participants.
Results
Genetic variation
In 19,292 successfully sequenced exomes, we identified 1,318,781 single nucleotide variants (SNVs) and 92,776 insertion/deletion (indel) variants (Supplementary Tables 1-3, Supplementary Information). Compared to NFE control exomes (gnomAD v2.1, Extended Data Fig. 1A), FinMetSeq exomes showed depletion of singletons and doubletons and excess variants with minor allele count (MAC)≥5, particularly for predicted-deleterious alleles (Extended Data Fig. 1B).
Extended Data Fig. 1
Allele frequency comparisons between FinMetSeq and NFE from gnomAD.
A) Distribution of allelic frequencies between FinMetSeq and gnomAD NFE. The comparison of allele frequencies shows the excess of variants at higher frequency in Finland as a result of the multiple bottlenecks experienced in Finnish population history.
B) Proportional site frequency spectra between FinMetSeq and gnomAD NFE by variant annotation class. In general, we find a depletion of the variants in the rarest frequency class, as well as enrichment of variants in the intermediate to common frequency range. The site frequency spectra were down-sampled to 18,000 chromosomes for each dataset.
C) Comparison of MAFs for trait-associated variants in FinMetSeq and NFE gnomAD. Plotted in gray background is a 2-D histogram of variants with non-zero allele frequencies in both gnomAD and FinMetSeq but no trait associations. Variants associated with at least one trait are colored and scaled inversely proportional to the logarithm of the association p-value. Variants >10x enriched in FinMetSeq compared to NFE are pink, those <10x enriched are in blue. The dashed line is the line of equal frequency. Two-sided uncorrected P-values are from a regression of trait on the count of alternative allele at each variant. The number of independent individuals used in each point is listed in Supplementary Table 5.
Association analyses
We tested for association between genetic variants in FinMetSeq and 64 clinically relevant quantitative traits after standard adjustments for medications and covariates and transformation to normality for analyses (Methods, Supplementary Tables 4 & 5). Sixty-two of 64 traits exhibited significant heritability with common SNVs (P<0.05; 5%
Extended Data Figure 2
Heritability of and correlations between traits.
Traits are in the same order, clockwise in A, and left to right and top to bottom in B, following the trait group color key.
A) Heritability estimated in 13,342 unrelated individuals (for abbreviations see Supplementary Table 4), for details see Supplementary Table 6.
B) Heatmap of: 1) absolute Pearson correlations of standardized trait values in upper triangle; 2) absolute values of estimated pairwise genetic correlations in lower triangle. Genetic correlations are estimated in 13,342 unrelated individuals. Values below the diagonal in gray had trait heritability less than 1.5 times the SE of heritability.
Single-variant association tests with genetic variants with MAC≥3 among the 3,558 to 19,291 individuals measured for each trait (Supplementary Tables 4 & 5) identified 1,249 associations (P<5×10-7) at 531 variants (Supplementary Table 7); 53 traits associated with ≥1 variant (Fig. 1A). All 1,249 associations remained significant after multiple testing adjustment (exome-wide and across the 64 traits using a hierarchical procedure setting average FDR at 5%, Methods). Using this procedure on the 531 associated variants, we detected 287 more associations (Supplementary Table 8), most reflecting high correlation between lipid traits. Of the 531 variants, those at >10x frequency in FinMetSeq compared to NFE were more likely to be trait-associated (OR=4.92, P=2.6×10-5; Extended Data Fig. 1C).
Figure 1
Characterization of associations.
A) Number of genomic loci associated with each trait. Bars are subdivided into common (MAF>1%, dark blue) and rare (MAF≤1%, light blue).
B) Relationship between estimated heritability and number of loci detected per trait. Each trait is colored by trait group. Vertical bars indicate ±2 standard errors. The gray line shows the linear regression fit to indicate the general trend. The number of independent individuals used in each point is listed in Supplementary Table 5. Height is the notable outlier.
After clumping associated variants within 1Mbp and with r2>0.5 into single loci (Methods), the 531 associated variants represented 262 distinct loci (597 trait-locus pairs, Supplementary Table 7). The number of associated loci per trait correlated positively with trait heritability (r=0.38, P=8.8×10-4), with height a notable outlier (Fig. 1B).Most variants and loci (61%) associated to a single trait; 4% associated to ≥10 traits. Overlapping associations (Extended Data Fig. 3A) reflect both phenotypic and genetic correlations and the estimated genetic correlation of trait pairs predicts shared loci between traits (Extended Data Fig. 3B). Gene-based association tests revealed 54 associations with P<3.88×10-6 and multi-trait FDR<0.05 (Methods, Supplementary Table 9), including ten traits associated with APOB (Extended Data Fig. 4) and a novel association of SECTM1 with HDL2-C (Extended Data Fig. 5).
Extended Data Fig. 3
Properties of associations shared between traits.
A) Shared genomic associations by pairs of traits. For traits x and y, color in row x and column y reflects the number of loci associated with both traits divided by the number of loci associated with trait x. Traits are presented in the same order as in Extended Data Figure 2A, and the side and top color bars reflect trait groups.
B) Relationship between estimated genetic correlation and extent of sharing of genetic associations. For each trait-pair, the extent of locus sharing is defined as the number of loci associated with both traits divided by the total number of loci associated with either trait. Analysis using the absolute value of the Pearson correlation of the residual series results in a very similar pattern. The number of trait pairs in each x-axis category are as follows: 0-1%: 819; 1-10%: 204, 11-20%: 102; 21-30%: 41; 31-40%: 29; 41-50%: 16, >50%: 13. The bar within each box is the median, the box represents the upper and lower quartiles, whiskers extend to 1.5x the interquartile range, and points represent outliers.
Extended Data Fig. 4
Gene-based association of extremely rare variants in APOB with serum total cholesterol.
The upper panel shows the distribution of the covariate adjusted and inverse-normal transformed phenotype. The lower panel displays the association statistics for each variant included in the gene-based test along with the trait value for minor allele carriers of each variant (orange triangles). SV.P is the P-value from the analysis of each variant in a single-variant analysis. The number of independent individuals in the analysis is 19,291.
Extended Data Fig. 5
Gene-based association of rare variants in SECTM1 with HDL2 cholesterol.
The upper panel shows the distribution of the covariate adjusted and inverse-normal transformed phenotype. The lower panel displays the association statistics for each variant included in the gene-based test, along with the trait value for minor allele carriers of each variant (orange triangles). SV.P is the P-value from the analysis of each variant in a single-variant analysis. The number of independent individuals in the analysis is 10,984.
To determine which of the 1,249 single-variant associations are distinct from previous GWAS findings, we repeated association analysis for each trait conditioning on published associated variants in the EBI GWAS Catalog (December 2016, Methods); 478 associations at 126 loci remained significant (P<×10-7), including at least one association for 48 traits (Supplementary Table 10). Conditionally-associated variants were more often rare (24% vs. 11%), more likely protein-altering (31% vs. 22%), and more frequently >10x enriched in FinMetSeq relative to NFE (19% vs. 10%) than associated variants overall.
Replication and follow-up
We attempted to replicate the 478 single-variant associations (unconditional and conditional P≤5×10-7) and follow up 2,120 sub-threshold associations from FinMetSeq (unconditional 5×10-7
16,17] participants not in FinMetSeq (n=18,215), Northern Finland Birth Cohort 1966[18] (n=5,139), and Helsinki Birth Cohort[19] (n=1,412), all imputed using the Finnish SISu v2 reference panel (www.sisuproject.fi). Following association analysis within each cohort, we conducted meta-analysis of the three imputation-based studies to test for replication of FinMetSeq variants (“replication analysis”), and four-study meta-analysis with FinMetSeq to follow up suggestive associations (“combined analysis”).
Of 448 significant variant-trait associations with replication data, 392 (87.5%) replicated at P<0.05 (Supplementary Table 11). Of the 1,417 sub-threshold associations, 431 reached P<5×10-7 in the combined analysis (Supplementary Table 12); >60% of variants we could not follow up were absent in the reference panel.Among the significant associations from FinMetSeq or combined analysis, 43 were with 26 predicted deleterious variants (six PTVs, 20 missense) that conditional analysis and literature review suggest are novel (Table 1). Nineteen associations (15 variants) were significant in FinMetSeq (Table 1; Supplementary Table 11); another 24 associations (16 variants) reached significance in combined analysis (Table 1; Supplementary Table 12). Of these 43 associations, 34 were with 19 variants either seen only in Finland or enriched >20-fold in FinMetSeq compared to NFE. Identifying associations for these 19 variants would have required much larger samples in NFE populations than in FinMetSeq (Fig. 2A, B). We provide brief summaries relating some of these associations to known biology and prior genetic evidence (Table 1, expanded version in Supplementary Table 13, Supplementary Information), highlighting here the most striking findings.
Table 1
Novel associations with predicted deleterious variants from FinMetSeq alone or combined analysis.
Chr:Pos (GRCh37)
Gene
FMS MAF
NFE MAF#
MAF Ratio(95% CI)
Trait
FMS P
FMS Beta
Repl. or comb. P**
Repl. or comb. Beta
1:55076137
FAM151A
0.099
0.0147
6.7 (6.1-7.5)
IDL-C
5.4×10-16
-0.187
2.1×10-17
-0.191
IDL-P
8.9×10-14
-0.172
1.9×10-16
-0.185
2:120848049
EPB41L5
0.085
0.044
1.9 (1.8-2.1)
eGFR*
1.7×10-6
-0.093
4.8×10-12
-0.107
Creatinine*
2.5×10-6
0.091
2.5×10-12
0.098
3:125831672
ALDH1L1
0.0026
0
∞
Gly
1.8×10-8
-0.873
4.5×10-4
-0.827
4:13612630
BOD1L1
0.0001
0
∞
WHR
4.7×10-7
-2.501
NA
NA
5:79336091
THBS4
0.0045
0.0001
45 (14.4-140.9)
Weight*
6.7×10-7
-0.377
3.2×10-7
-0.252
5:140181423
PCDHA3
0.0001
NA
NA
WHR
2.7×10-7
2.559
NA
NA
9:107548661
ABCA1
0.00023
0
∞
HDL-C
4.8×10-10
-2.046
NA
NA
9:136501728
DBH
0.05
0.0021
23.8 (18.4-30.4)
Diast-BP*
1.5×10-6
-0.115
2.8×10-12
-0.11
11:47282929
NR1H3
0.0042
0.00003
140
HDL-C
1.4×10-7
0.425
6.7×10-7
0.435
(19.5-1004.4)
HDL2-C*
3.2×10-6
0.473
1.3×10-8
0.458
VLDL-C*
4.0×10-6
-0.469
3.1×10-7
-0.412
11:116692293
APOA4
0.0096
0.012
0.8 (0.7-0.9)
HDL-C*
2.2×10-5
0.225
1.5×10-7
0.196
11:117352857
DSCAML1
0.016
0.0002
80
VLDL-C
4.1×10-8
0.299
2.0×10-3
0.162
(35.7-179.3)
14:101198426
DLK1
0.023
0.00013
177
Height*
2.7×10-5
-0.149
1.2×10-10
-0.163
(66.3-472.4)
16:55862682
CES1
0.0018
0.00003
60
HDL-C
1.1×10-10
0.771
3.8×10-6
0.793
(8.3-432.0)
ApoA1*
1.9×10-6
0.668
4.0×10-9
0.718
16:56996009
CETP
0.0017
0.00003
56.7
ApoA1
2.6×10-8
0.834
1.8×10-4
1.034
(7.9-408.3)
HDL-C
1.1×10-14
0.946
8.8×10-21
1.217
16:68013570
DPEP3
0.0099
0.00044
22.5
HDL-C
1.6×10-7
-0.295
7.2×10-15
-0.373
(12.9-39.1)
ApoA1*
5.2×10-6
-0.294
4.0×10-7
-0.253
16:68732169
CDH3
0.0044
0.00064
6.9 (4.2-11.2)
Pyr*
3.7×10-5
0.417
6.6×10-10
0.471
17:6599157
SLC13A5
0.00091
0
∞
Cit
1.3×10-9
1.294
9.5×10-12
1.309
17:7129898
DVL2
0.02
0.02
1 (0.9-1.1)
Val*
4.2×10-5
-0.239
5.7×10-9
-0.232
17:39135270
KRT40
0.00013
0
∞
HDL-C
3.2×10-8
2.416
NA
NA
17:41062979
G6PC
0.025
0
∞
MUFA
4.4×10-7
0.275
3.5×10-1
0.067
Glol*
5.8×10-6
0.218
4.1×10-7
0.183
CRP*
1.6×10-5
0.175
4.0×10-9
0.185
TotTG*
1.0×10-6
0.23
1.3×10-7
0.197
17:41926216
CD300LG
0.00034
0
∞
HDL-C
4.8×10-14
2.061
4.9×10-2
0.801
HDL2-C
1.3×10-7
2.154
NA
NA
ApoA1
8.1×10-8
1.694
NA
NA
18:47091686
LIPG
0.0025
0
∞
HDL2-C*
1.2×10-5
0.579
5.6×10-10
0.624
PC*
3.1×10-6
0.624
1.1×10-8
0.578
TotPG*
9.0×10-6
0.594
1.1×10-7
0.538
19:10683762
AP1M2
0.015
0.00009
ApoB
5.8×10-8
-0.282
1.5×10-3
-0.199
167
IDL-C*
1.1×10-6
-0.289
6.9×10-14
-0.319
(41.6-668.5)
IDL-P*
2.1×10-6
-0.281
8.5×10-14
-0.318
Remnant-C*
8.0×10-6
-0.268
2.7×10-12
-0.301
19:11350904
ANGPTL8
0.0025
0
∞
HDL2-C*
3.4×10-6
0.564
1.1×10-8
0.574
19:49318380
HSD17B14
0.046
0.05
0.9 (0.8-1.0)
Val*
3.4×10-5
-0.152
2.1×10-7
-0.144
20:24994201
ACSS1
0.0026
0
∞
Ace*
1.3×10-5
0.626
2.1×10-12
0.631
Non-Finnish European (NFE) MAF taken from gnomAD v2.1 control exomes excluding Estonian or Swedish individuals.
0; variant present in gnomAD, but not in NFE controls. NA; variant not present in gnomAD.
Association only reaches significance in combined analysis.
Replication P-values<0.05 are highlighted in bold.
Figure 2
Allelic enrichment in the Finnish population and its effect on genetic discovery.
A) Relationship between MAF and estimated effect size for associations discovered in FinMetSeq. Each variant reaching significance in FinMetSeq is plotted, with associations in Table 1 represented by dark blue points (FinMetSeq MAF) and green points (NFE MAF). Purple lines indicate 80% power curves for sample sizes of 10,000 and 20,000 at α=5x10-7.
B) Same plot as in A, highlighting the variants in Table 1 only reaching significance in the combined analysis.
Anthropometric traits
A predicted damaging missense variant (p.Arg94Cys) in THBS4 45X more frequent in FinMetSeq than in NFE was associated in the combined analysis with a mean 5.9 kg decrease in body weight. THBS4 encodes thrombospondin 4, a matricellular protein found in blood vessel walls and highly expressed in heart and adipose [20]. THBS4 may regulate vascular inflammation[21] and has been implicated in heart disease risk[22].A predicted damaging missense variant (p.Val104Met) in DLK1 177X more frequent in FinMetSeq than in NFE is associated in the combined analysis with a mean 1.3cm decrease in height. DLK1 encodes Delta-Like Notch Ligand 1, an epidermal growth factor that interacts with fibronectin and inhibits adipocyte differentiation. Uniparental disomy of DLK1 causes Temple and Kagami-Ogata Syndromes, characterized by growth restriction, hypotonia, joint laxity, motor delay, and early onset of puberty[23]. Paternally-inherited common variants near DLK1 are associated with childhood obesity, type 1 diabetes, age at menarche, and precocious puberty[24-26]. Homozygous null mutations in the mouse ortholog Dlk-1 lead to embryos with reduced size, skeletal length, and lean mass[27]; in Darwin’s finches, SNVs at this locus have a strong effect on beak size[28].
HDL-C
A predicted deleterious missense variant p.Arg112Trp in CD300LG is associated in FinMetSeq with a mean 0.95 mmol/l increase in HDL-C and is associated with increased HDL2-C and ApoA1. This variant, absent in NFE, has an opposite direction of effect from a previously reported deleterious missense variant in this gene[29], which encodes a type I cell surface glycoprotein.
Amino acids
A stop gain variant (p.Arg722X) in ALDH1L1 is associated in FinMetSeq with reduced serum glycine levels and is absent in NFE; this trait may increase risk for cardiometabolic disorders[30,31]. ALDH1L1 encodes 10-formyltetrahydrofolate dehydrogenase, which competes with serine hydroxymethyltransferase to alter the ratio of serine to glycine in the cytosol. Gene-based tests suggest additional PTVs and missense variants in ALDH1L1 alter glycine levels (P=1.4×10-20, Extended Data Fig. 6, Supplementary Table 9).
Extended Data Fig. 6
Gene-based association of extremely rare variants in ALDH1L1 with glycine levels.
The upper panel shows the distribution of the covariate adjusted and inverse-normal transformed phenotype. The lower panel displays the association statistics for each variant included in the gene-based test, along with the trait value for minor allele carriers of each variant (orange triangles). SV.P is the P-value from the analysis of each variant in a single-variant analysis. The number of independent individuals in the analysis is 8,206.
Ketone bodies
A predicted damaging missense variant (p.Phe517Ser) in ACSS1 is associated in the combined analysis with increased serum acetate levels and is absent in NFE. ACSS1 encodes an acyl-coenzyme A synthetase and plays a role in conversion of acetate to acetyl-CoA. In rodents, increased acetate levels lead to obesity, insulin resistance, and metabolic syndrome[32].
Trait-associations and disease endpoints
Genotype data from FinnGen[33] enabled us to test whether deleterious variants responsible for our novel trait associations contribute to related disease endpoints. We examined 22 diseases for the 25 available variants in Table 1; three variants were associated with diseases in FinnGen at Bonferroni threshold P<0.05/(22×25)=9.0×10-5 (Supplementary Table 14).A predicted damaging missense variant (p.Ser32Pro) in KRT40, associated in FinMetSeq with elevated HDL-C, but absent in NFE, is associated in FinnGen with increased pancreatitis risk. While this is the first disease association reported for KRT40, type I keratins regulate exocrine pancreas homoeostasis[34]. A 29bp deletion causing a frameshift in FAM151A is associated in FinMetSeq with decreased total cholesterol in IDL and decreased IDL particle concentration, is 6.7X more frequent in FinMetSeq than NFE, and is associated in FinnGen with decreased risk of myocardial infarction. Interpretation of this association is complicated as the variant is also situated in an overlapping gene (ACOT11) involved in fatty acid metabolism and lies <1Mbp from a cardioprotective variant in PCSK9. Finally, a predicted damaging missense variant (p.Arg65Trp) in DBH associated with a mean 1.0 mmHg decrease in diastolic blood pressure in the combined analysis, is 23.8X more frequent in FinMetSeq than in NFE, and is associated in FinnGen with decreased risk for hypertension. Distinct loci in this gene and gene-based tests are associated with mean arterial pressure[35,36].
Replication outside Finland
To assess the generalizability of these novel associations, we attempted to replicate associations from our combined analysis in the UK Biobank (UKB). Across eight anthropometric and blood pressure traits for which UKB data are publicly available, our combined analysis identified 31 trait-variant associations, of which 23 were present in UKB. Twenty of 23 associations were to variants with MAF>1% in FinMetSeq and comparable frequency in UKB; 15 (75%) showed association in UKB at P<0.05/23=2.2×10-3. The three rare variants in this analysis were all >10x more frequent in FinMetSeq than UKB; none were associated in UKB (Supplementary Table 15). However, even after adjusting for winner’s curse[37], we had <50% power to detect these associations in UKB, consistent with the argument that extremely large samples will be needed in other populations to achieve the power for rare-variant association studies that we observed in Finland.
Enriched variants cluster geographically
Given the concentration of Finnish Disease Heritage mutations within regions of late-settlement Finland[38], we hypothesized that trait-associated variants discovered through FinMetSeq might also cluster geographically. Principal component analysis supported this hypothesis, revealing broad-scale population structure within late-settlement regions among 14,874 unrelated FinMetSeq participants with known parental birthplaces (Extended Data Fig. 7). Carriers of PTVs and missense alleles showed more clustering of parental birthplaces than carriers of synonymous alleles, even after adjusting for MAC (Supplementary Tables 16A, B).
Extended Data Fig. 7
Population structure of the FinMetSeq dataset, by region.
Population structure, by region, from principal components analysis of exome sequencing variant data (MAF > 1%), for 14,874 unrelated individuals known parental birthplaces. Color indicates individuals with both parents born in the same region; gray indicates individuals with different parental birth regions, or missing information for one parent. Abbreviations for the regions: Usm, Uusimaa; Swf, Southwest Finland; Stk, Satakunta; Khm, Kanta-Hame; Prk, Pirkanmaa; Phm, Paijat-Hame; Kyl, Kymenlaakso; SKa, Southern Karelia; Nka, Northern Karelia; SSv, Southern Savonia; NSv, Northern Savonia; Ctf, Central Finland; SOs, Southern Ostrobothnia; Osb, Ostrobothnia; COs, Central Ostrobothnia; NOs, Northern Ostrobothnia; Kai, Kainuu; Lap, Lapland; X, split parental birthplaces. Large solid circles represent the center of each region.
To analyze the distribution of variants within late-settlement Finland, we delineated geographically distinct population clusters using haplotype sharing among 2,644 unrelated individuals with both parents born in the same municipality (Methods, Extended Data Fig. 8). We compared variant counts across functional classes and frequencies between an early-settlement reference cluster and 12 clusters containing ≥100 individuals (Extended Data Fig. 9, Supplementary Tables 17, 18). Clusters representing the most heavily bottlenecked late-settlement regions (Lapland and Northern Ostrobothnia) displayed a deficit of singletons and enrichment of intermediate frequency variants compared to other clusters.
Extended Data Fig. 8
Hierarchical clustering tree produced by fineSTRUCTURE.
We identified 16 subpopulations within the FinMetSeq dataset by applying a haplotype-based clustering algorithm, fineSTRUCTURE, on 2,644 unrelated individuals born by 1955 whose parents were both born in the same municipality (Methods). Each subpopulation is named based on the most common parental birth location among its members, with the following abbreviations: NKa, North Karelia; NSv, North Savonia; SOs, South Ostrobothnia; NOs, North Ostrobothnia; Kai, Kainuu; Lap, Lapland; SuK, Surrendered Karelia. A map of Finland with regions labeled is supplied for reference. If multiple subpopulations share the same location label, the subpopulation is further distinguished with a numeral. NSv3 is used as an internal reference in enrichment analysis. See Supplementary Table 17 for more detailed demographic descriptions of each subpopulation.
Extended Data Fig. 9
Regional variation in allele frequencies by functional annotation.
Enrichment of variants by allelic class in regional sub-populations of late settlement Finland (defined in Supplementary Table 17). Each bin represents the ratio of variants in the subpopulation compared to the reference subpopulation (NSv3), after down-sampling the frequency spectra of all populations to 200 chromosomes. Pink cells represent an enrichment (ratio >1), blue cells represent a depletion (ratio <1). Sample sizes and confidence intervals on each enrichment ratios, and their P-values, are presented in Supplementary Table 18. The results are consistent with multiple bottlenecks in late settlement Finland, particularly for populations in Lapland and Northern Ostrobothnia.
Variants >10x enriched in FinMetSeq compared to NFE displayed particularly strong geographical clustering (Supplementary Table 19). We further characterized clustering for FinMetSeq-enriched trait-associated variants, by comparing mean distances between birthplaces of parents of minor allele carriers to those of non-carriers (Supplementary Table 20). Most such variants were highly localized. For example, for rs780671030 in ALDH1L1, the mean distance between parental birthplaces is 135km for carriers and 250km for non-carriers (P<1.0×10-7, Fig. 3A).
Figure 3
Geographical clustering of associated variants.
A) Example of geographical clustering for a novel trait-associated variant (Table 1). The map shows birth locations of all 113 parents of carriers (orange) and 113 randomly selected parents of non-carriers (blue) of the minor allele for rs780671030 in ALDH1L1.
B) FDH mutations (N=38) geographically cluster (by parental birthplace) similarly to trait-associated variants (Table 1) that are >10x more frequent in FMS than in NFE (N=12) and more than enriched variants from our combined analysis (N=7). For all variants, carriers clustered more than non-carriers (center line, median; box limits, upper and lower quartiles; whiskers, 1.5 interquartile range; points, outliers).
Finally, we identified comparable geographic clustering between carriers of 35 Finnish Disease Heritage mutations and carriers of FinMetSeq-enriched trait-associated variants (Fig. 3B, Methods). Clustering was dramatically greater than that observed for non-carriers of both sets of variants, suggesting that rare trait-associated variants may be much more unevenly distributed geographically than previously appreciated.
Discussion
We demonstrate that a well-powered exome sequencing study of deeply phenotyped individuals can identify numerous rare variants associated with medically relevant quantitative traits. The variants we identified provide a useful starting point for studies aimed at uncovering biological mechanisms and fostering clinical translation. The power of this study to discover rare-variant associations derives from the numerous deleterious variants that are enriched in or unique to Finland. Prioritizing the sequencing of multiple population isolates that have expanded from recent bottlenecks is a strategy for scaling up the discovery of rare-variant associations[7,39-41]. Because genetic drift results in a different set of alleles to pass through population-specific bottlenecks, enriching some variants and depleting others, the numerous rare-variant associations that could be identified by sequencing well-phenotyped samples across multiple isolates could rapidly increase our understanding of the genetic architecture of complex traits.Our results support recent suggestions of continuity between the genetic architectures of complex traits and disorders classically considered monogenic[42,43], by identifying numerous deleterious variants with large effects on quantitative traits that demonstrate geographical clustering comparable to that of the mutations responsible for the Finnish Disease Heritage.Using a Finland-specific reference panel[44] to impute FinMetSeq variants into array-genotyped samples from three other Finnish cohorts enabled us to identify additional novel associations. However, the clustering in FinMetSeq of deleterious trait-associated variants within limited geographical regions and our inability to follow-up >700 sub-threshold associations from FinMetSeq for which the associated variants were absent in the Finnish imputation reference panel, emphasize the importance of representing regional subpopulations in such reference panels, to account for fine-scale population structure.The value of rare-variant studies in population isolates will depend on the richness of phenotypes in sequenced cohorts from these populations. For example, we associated <100 of the >24,000 deleterious, highly enriched variants identified in FinMetSeq with one of the 64 quantitative traits studied here. The associations we identified to disease endpoints in FinnGen hint at the discoveries that will be possible when that database reaches its full size of 500,000 participants. The insights gained from such efforts will accelerate the implementation of precision health, informing projects in more heterogeneous populations which are still at an early stage[45].
Methods
METSIM and FINRISK studies: designs, phenotypes, and sequenced participants
METSIM is a single-site study investigating cardiometabolic disorders and related traits in 10,197 men randomly selected from the population register of Kuopio, Eastern Finland, aged 45 to 73 years at initial examination from 2005 to 2010[15,46]. We attempted exome sequencing of all METSIM study participants.FINRISK is a series of health examination surveys based on random population samples from five (six in 2002) geographical regions of Finland, carried out every five years beginning in 1972[47]. For exome sequencing, we chose 10,192 participants in the 1992-2007 FINRISK surveys from northeastern Finland (former provinces of North Karelia, Oulu, and Lapland).All participants in both studies provided informed consent, and study protocols were approved by the Ethics Committees at participating institutions (National Public Health Institute of Finland; Hospital District of Helsinki and Uusimaa; Hospital District of Northern Savo). All relevant ethics committees approved this study.
Selection of traits, harmonization, exclusions, covariate adjustment, and transformation
Of the 257 quantitative traits measured in both METSIM and FINRISK, we selected 64 for association analysis in FinMetSeq based on clinical relevance for cardiovascular and metabolic health (Supplementary Tables 4, 5). We excluded individuals with type 1 diabetes and women who were pregnant at the time of phenotyping from all analyses; individuals with T2D from analyses of glycemic traits; and individuals not fasting for at least 8 hours after their last meal for traits influenced by food consumption. A complete list of exclusions is in Supplementary Table 5. We adjusted measured values of systolic and diastolic blood pressures for individuals on antihypertensive medication at the time of testing[48,49], and serum lipid measures for individuals on lipid regulating medications[50,51]. Trait adjustments are listed in Supplementary Table 5.We prepared quantitative traits for association analysis separately for METSIM and FINRISK by linear regression on trait-specific covariates after log transforming skewed variables. Covariates for regression analyses included: age and age[2] (METSIM); sex, age, age[2], and cohort year (FINRISK). Trait transformations and trait-specific covariates are listed in Supplementary Table 5. Several traits were adjusted for sex hormone treatment, which included women on contraceptives or hormone replacement therapy. We transformed residuals from these initial regression analyses to normality using inverse normal scores.
Exome sequencing
We carried out exome sequencing in two phases.Phase 1 We quantified 10,379 DNA samples with PicoGreen (ThermoFisher Scientific) and randomly parsed samples with adequate DNA (>250ng) into cohort-specific files. We then re-arrayed samples to ensure equal numbers of METSIM and FINRISK samples on each 96-well plate, alternating samples between studies in consecutive positions within and across plates, to minimize between-study batch effects.Using 100-250ng input DNA, we constructed dual indexed libraries using the HTP Library Kit (KAPA Biosystems, target insert size of 250bp), pooling twelve libraries prior to hybridization to the SeqCap EZ HGSC VCRome (Roche) exome reagent. After estimating the concentration of each captured library pool by qPCR (Kapa Biosystems) to produce appropriate cluster counts for the HiSeq2000 platform (Illumina), we generated 2x100bp paired-end sequence data yielding ~6 Gb per sample to achieve a coverage depth of ≥20x for ≥70% of targeted bases for every sample.Phase 2 We quantified, prepared, pooled, and captured 9,937 samples just as in Phase 1. Here we generated 2×125bp paired-end sequencing reads on the HiSeq2500 1T to achieve the same coverage as in Phase 1.
Contamination detection, sequence alignment, sample QC, and variant calling
We aligned sequence reads to human genome reference build 37 (bwa-mem, v0.7.7), realigned indels (GATK[52] IndelRealigner v2.4), and marked duplicates (picard MarkDuplicates, v1.113; http://broadinstitute.github.io/picard) and overlapping bases (BamUtil clipOverlap v1.0.11; http://genome.sph.umich.edu/wiki/BamUtil:_clipOverlap).For each sample, we required SNV genotype array concordance >90% if SNV array data were available, excluding samples with estimated contamination >3% or sample swaps compared to existing genotype data (verifyBamID[53], v1.1.1; Supplementary Table 1).We called SNVs and short indels with GATK[52] (v3.3, using recommended best practices) for all targeted exome bases and 500bp of sequence up and downstream of each target region using HaplotypeCaller. We merged calls in batches of 200 individuals using CombineGVCFs and recalled genotypes for all individuals at all variable sites with GenotypeGVCFs.After merging genotypes for the 19,378 samples that passed preliminary QC checks, we filtered SNVs and indels separately using the recommended best practices for Variant Quality Score Recalibration (VQSR). We used the true positive variants in the GATK resource bundle (v2.5; build37) to train the VQSR model after restricting to sites in targeted exome regions. After assessment with VQSR, we retained variants for which we identified ≥99% of true positive sites used in the training model for both SNVs and indels.Following initial variant filtering, we decomposed multi-allelic variants into bi-allelic variants, left-aligned indels, and dropped redundant variants using vt[54] (version 0.5). We filtered variants with >2% missing calls and/or Hardy-Weinberg p-value<10-6. We additionally removed variants with an overall allele balance (alternate AC/sum of total AC) <30% in genotyped samples. We excluded 86 individuals with >2% missing variant calls yielding a final analysis set of 19,292 individuals.
Array genotypes, genotype imputation, and integrated exome+imputation panel
For all but 1,488 participants (57 METSIM, 1,431 FINRISK), previously generated array genotypes were available[17,55], with which we generated three datasets: (1) a merged array-based call set of all variants present in ≥90% of array-genotyped individuals across both cohorts; (2) a merged array-based Haplotype Reference Consortium (HRC) v1.1 imputed dataset using the Michigan Imputation Server[56,57]; (3) an integrated data set containing HRC imputed genotypes and exome-sequence variants (excluding all individuals without array data, and using the sequence-based genotypes where there was overlap between sequenced and imputed genotypes).
Annotation
We annotated the final set of sequence variants passing QC using Ensembl’s variant effect predictor (VEP v76)[58] employing five in silico algorithms to predict the functional impact of missense variants: PolyPhen2 HumDiv and HumVar[59], LRT[60], MutationTaster[61], and SIFT[62].
Association testing
Single variants
We carried out single-variant association tests for transformed trait residuals with genotype dosages for variants with MAC≥3 assuming an additive genetic model, using the EMMAX[63] linear mixed model approach, as implemented in EPACTS (v3.3.0; http://genome.sph.umich.edu/wiki/EPACTS), to account for relatedness between individuals. We used genotypes for sequenced variants with MAF≥1% to construct the genetic relationship matrix (GRM).
Conditioning on associated variants from prior GWAS
To differentiate association signals identified here from known associations, for each trait we performed exome-wide association analysis conditioning on variants previously associated (P<10-7) with that trait in the EBI GWAS catalog (https://www.ebi.ac.uk/gwas/downloads; December 4, 2016 version)[64], publications, or manuscripts in preparation[55,65-67]. The keywords from the GWAS catalog we used to assign known variants to each trait are in Supplementary Table 21. We also manually curated published associations for specific metabolites[65,68].Using the combined HRC+exome panel, we pruned each trait-specific list of associated variants (“GWAS variants”) based on linkage disequilibrium (LD) (r2>0.95). Of 23 GWAS variants absent in the HRC+exome panel, we identified a proxy (r2>0.80) variant for 17; we excluded the remaining six variants from conditional analysis. The variants included in conditional analysis are listed in Supplementary Table 22. We extracted genotypes for variants used in conditional analysis from the HRC+exome panel and converted dosages to alternate allele counts by rounding to the nearest integer (0, 1, or 2). For conditional analyses, we imputed missing genotypes for the individuals without array data using the mean genotype. We then ran association analysis using the same linear mixed model approach as in unconditional analysis but including the complete set of pruned GWAS variants as covariates in the association test. We then evaluated the novelty of conditional associations by searching OMIM, ClinVar, and the literature.
Defining loci
To identify the number of distinct associations for each trait, we performed LD clumping using Swiss (https://github.com/welchr/swiss) of variants with (1) unconditional P<5×10-7 or (2) both unconditional and conditional P<5×10-5 for at least one trait. For each variant in this subset, we provided Swiss with the minimum unconditional p-value across all traits. The clumping procedure starts with the variant with the smallest p-value, merges into one locus all variants within ±1Mbp that have r2>0.5 with the index variant, and iterates this process until no variants remain.
Calculating effects and variance explained of individual variants
For novel variants highlighted in Table 1 we evaluated the effect of each variant on the trait values by calculating the mean trait value in carriers and non-carriers. As the effect estimates from our association tests are standardized, we calculated variance explained for a given variant with the equation 2f (1-f) where f is the minor allele frequency and is the estimated effect size. The variance explained is in Supplementary Table 10.
Gene-based testing
We carried out gene-based association tests using the mixed model implementation of SKAT-O[69], considering three different, but nested, sets of variants (variant “masks”):(1) PTVs at any allele frequency with VEP annotations: frameshift_variant, initiator_codon_variant, splice_acceptor_variant, splice_donor_variant, stop_lost, stop_gained;(2) PTVs included in (1) plus missense variants with MAF<0.1% scored as “damaging” or “deleterious” by all five functional prediction algorithms;(3) PTVs included in (1) plus missense variants with MAF<0.5% scored as “damaging” or “deleterious” by all five algorithms.For each trait and mask, we only tested genes with at least two qualifying variants. Each mask contained a different number of genes with at least two qualifying variants: up to 7,996, 12,795, and 12,890 for the three masks, respectively. The exact number of genes tested varied by trait due to sample size. We first used a Bonferroni-corrected exome-wide threshold for 12,890 genes, which corresponds to a threshold of P<3.88×10-6. Analogous to single-variant association, we passed genes meeting this association threshold forward for additional consideration with hierarchical FDR correction, described below.
Hierarchical FDR correction for testing multiple traits and variants
To control for multiple testing across 64 traits, we adopted an FDR controlling procedure[70], using a two-stage hierarchical strategy (described in Supplementary Information). Stage 1 identifies the set of R variants (or genes) associated with at least one trait (P<5×10-7 for single-variant unconditional results and P<3.88×10-6 for gene-based results), controlling genome-wide FDR across all variants at 0.05. Stage 2 identifies all traits associated with the discovered variants in a manner guaranteeing an average FDR<0.05.
Genotype validation
We validated exome sequence-based genotype calls using Sanger sequencing for METSIM carriers of 13 trait-associated very rare variants with MAF<0.1% in seven genes, finding concordance for 107 of 108 (99.1%) non-reference genotypes evaluated.
Replication in additional Finnish cohorts
We attempted to replicate significant single-variant associations (P<5×10-7) and follow-up suggestive single-variant associations (P<5×10-5) using imputed array data from up to 24,776 individuals from three cohort studies: Northern Finland Birth Cohort 1966 (NFBC1966)[18], the Helsinki Birth Cohort Study (HBCS)[19], and FINRISK study participants not included in FinMetSeq[16,17].For each cohort, prior to phasing we performed genotype quality control batch-wise using standard quality thresholds. We pre-phased array genotypes with Eagle[71] (v2.3) and imputed genotypes genome-wide with IMPUTE[72] (v2.3.1) using 2,690 sequenced Finnish genomes and 5,092 sequenced Finnish exomes. We assessed imputation quality by confirming sex, comparing sample allele frequencies with reference population estimates, and examining imputation quality (INFO score) distributions. We excluded any variant with INFO<0.7 within a given batch from all replication/follow-up analyses.For each cohort, we matched, harmonized, covariate adjusted, and transformed available phenotypes as described above for FinMetSeq, and ran single-variant association using the EMMAX linear mixed model implemented in EPACTS, after generating kinship matrices from LD-pruned (command: plink --indep-pairwise 50 5 0.2) directly genotyped variants with MAF>5%.
Association to disease endpoints
From >1,100 disease endpoints available for analysis in FinnGen, we selected 22 we considered most relevant to the traits analyzed in FinMetSeq, identifying variant associations as described in Tabassum et al.[33].
Association replication in UK Biobank
For eight FinMetSeq anthropometric and blood pressure traits available in UKB (height, weight, BMI, hip circumference, waist circumference, fat percentage, systolic blood pressure, and diastolic blood pressure), we extracted, for variants reaching P<5x10-7 in our combined analysis, trait-variant association statistics from http://www.nealelab.is/uk-biobank. Seven of the eight traits had at least one associated variant and 23 of the total of 31 variants were available in UKBB. A comparison of association results is in Supplementary Table 15.
Population genetic analyses
Identifying unrelated individuals
To identify nearly independent common SNVs, we removed SNVs with MAF<5% and pruned the remaining SNVs in windows of 50 SNVs, in steps of 5 SNVs, such that no pair of SNVs had r2>0.2. We used KING[73] to estimate pairwise relationships among the exome-sequenced individuals, removing one individual from each pair inferred by KING to have a relationship of 3rd degree or closer, yielding 14,874 unrelated individuals for population genetic analyses.
Enrichment of predicted-deleterious alleles in Finland
We assessed enrichment of predicted-deleterious alleles in Finland by comparing the 14,874 nearly unrelated FinMetSeq individuals to the 14,944 NFE control exomes in gnomAD (after removing NFE individuals from countries with substantial Finnish populations, Estonia and Sweden). We analyzed the two most common alleles at each site with base quality score >10, mapping quality score >20, and coverage equal to or greater than that found in ≥80% of variable sites (17.73X in FinMetSeq, 32.27X in gnomAD), resulting in ~38.6 Mbp for comparisons. We contrasted the proportional site frequency spectra for FinMetSeq and NFE for five functional variant categories (PTVs, missense, synonymous, UTR, and intronic variants) after down-sampling both datasets to 18,000 chromosomes.We also assessed the enrichment of deleterious alleles within subpopulations of the FinMetSeq dataset. We applied Chromopainter and fineSTRUCTURE on 2,644 unrelated FinMetSeq individuals whose parents were both born in the same municipality to identify 16 sub-population clusters[74] (Supplementary Information). Of the 16 clusters, we used as the reference population a cluster for which the highest proportion of the parents of its members were from early-settlement Finland (NSv3, Supplementary Table 17). We used the twelve clusters with >100 members in subsequent analyses (Supplementary Table 17). We then compared the ratio of the site frequency spectra to the reference for PTVs, missense, and synonymous variants, down-sampling both datasets to 200 haploid chromosomes. For each comparison, we computed statistical evidence for enrichment or depletion at a given allele count bin by exact binomial test against a null of equal number of variants found in both the test and reference cluster.
Geographical clustering of predicted functionally deleterious alleles
We used genome-wide array genotype data on the 13,326 unrelated individuals for whom both exome sequence and array data were available to estimate heritability and genetic correlations for the 64 traits. We constructed a GRM with PLINK[75] (v.1.90b, https://www.cog-genomics.org/plink2) by applying additional filters for MAF>1% and genotype missingness rate <2% to the set of previously-used genotyped SNVs, leaving 205,149 SNVs for GRM calculation. We used the exact mixed model approach of biMM[76] (v.1.0.0, http://www.helsinki.fi/~mjxpirin/download.html) to estimate the heritability of our 64 traits and the genetic correlation of the 2,016 trait pairs.
Allele frequency comparisons between FinMetSeq and NFE from gnomAD.
A) Distribution of allelic frequencies between FinMetSeq and gnomAD NFE. The comparison of allele frequencies shows the excess of variants at higher frequency in Finland as a result of the multiple bottlenecks experienced in Finnish population history.B) Proportional site frequency spectra between FinMetSeq and gnomAD NFE by variant annotation class. In general, we find a depletion of the variants in the rarest frequency class, as well as enrichment of variants in the intermediate to common frequency range. The site frequency spectra were down-sampled to 18,000 chromosomes for each dataset.C) Comparison of MAFs for trait-associated variants in FinMetSeq and NFE gnomAD. Plotted in gray background is a 2-D histogram of variants with non-zero allele frequencies in both gnomAD and FinMetSeq but no trait associations. Variants associated with at least one trait are colored and scaled inversely proportional to the logarithm of the association p-value. Variants >10x enriched in FinMetSeq compared to NFE are pink, those <10x enriched are in blue. The dashed line is the line of equal frequency. Two-sided uncorrected P-values are from a regression of trait on the count of alternative allele at each variant. The number of independent individuals used in each point is listed in Supplementary Table 5.
Heritability of and correlations between traits.
Traits are in the same order, clockwise in A, and left to right and top to bottom in B, following the trait group color key.A) Heritability estimated in 13,342 unrelated individuals (for abbreviations see Supplementary Table 4), for details see Supplementary Table 6.B) Heatmap of: 1) absolute Pearson correlations of standardized trait values in upper triangle; 2) absolute values of estimated pairwise genetic correlations in lower triangle. Genetic correlations are estimated in 13,342 unrelated individuals. Values below the diagonal in gray had trait heritability less than 1.5 times the SE of heritability.
Properties of associations shared between traits.
A) Shared genomic associations by pairs of traits. For traits x and y, color in row x and column y reflects the number of loci associated with both traits divided by the number of loci associated with trait x. Traits are presented in the same order as in Extended Data Figure 2A, and the side and top color bars reflect trait groups.B) Relationship between estimated genetic correlation and extent of sharing of genetic associations. For each trait-pair, the extent of locus sharing is defined as the number of loci associated with both traits divided by the total number of loci associated with either trait. Analysis using the absolute value of the Pearson correlation of the residual series results in a very similar pattern. The number of trait pairs in each x-axis category are as follows: 0-1%: 819; 1-10%: 204, 11-20%: 102; 21-30%: 41; 31-40%: 29; 41-50%: 16, >50%: 13. The bar within each box is the median, the box represents the upper and lower quartiles, whiskers extend to 1.5x the interquartile range, and points represent outliers.
Gene-based association of extremely rare variants in APOB with serum total cholesterol.
The upper panel shows the distribution of the covariate adjusted and inverse-normal transformed phenotype. The lower panel displays the association statistics for each variant included in the gene-based test along with the trait value for minor allele carriers of each variant (orange triangles). SV.P is the P-value from the analysis of each variant in a single-variant analysis. The number of independent individuals in the analysis is 19,291.
Gene-based association of rare variants in SECTM1 with HDL2 cholesterol.
The upper panel shows the distribution of the covariate adjusted and inverse-normal transformed phenotype. The lower panel displays the association statistics for each variant included in the gene-based test, along with the trait value for minor allele carriers of each variant (orange triangles). SV.P is the P-value from the analysis of each variant in a single-variant analysis. The number of independent individuals in the analysis is 10,984.
Gene-based association of extremely rare variants in ALDH1L1 with glycine levels.
The upper panel shows the distribution of the covariate adjusted and inverse-normal transformed phenotype. The lower panel displays the association statistics for each variant included in the gene-based test, along with the trait value for minor allele carriers of each variant (orange triangles). SV.P is the P-value from the analysis of each variant in a single-variant analysis. The number of independent individuals in the analysis is 8,206.
Population structure of the FinMetSeq dataset, by region.
Population structure, by region, from principal components analysis of exome sequencing variant data (MAF > 1%), for 14,874 unrelated individuals known parental birthplaces. Color indicates individuals with both parents born in the same region; gray indicates individuals with different parental birth regions, or missing information for one parent. Abbreviations for the regions: Usm, Uusimaa; Swf, Southwest Finland; Stk, Satakunta; Khm, Kanta-Hame; Prk, Pirkanmaa; Phm, Paijat-Hame; Kyl, Kymenlaakso; SKa, Southern Karelia; Nka, Northern Karelia; SSv, Southern Savonia; NSv, Northern Savonia; Ctf, Central Finland; SOs, Southern Ostrobothnia; Osb, Ostrobothnia; COs, Central Ostrobothnia; NOs, Northern Ostrobothnia; Kai, Kainuu; Lap, Lapland; X, split parental birthplaces. Large solid circles represent the center of each region.
Hierarchical clustering tree produced by fineSTRUCTURE.
We identified 16 subpopulations within the FinMetSeq dataset by applying a haplotype-based clustering algorithm, fineSTRUCTURE, on 2,644 unrelated individuals born by 1955 whose parents were both born in the same municipality (Methods). Each subpopulation is named based on the most common parental birth location among its members, with the following abbreviations: NKa, North Karelia; NSv, North Savonia; SOs, South Ostrobothnia; NOs, North Ostrobothnia; Kai, Kainuu; Lap, Lapland; SuK, Surrendered Karelia. A map of Finland with regions labeled is supplied for reference. If multiple subpopulations share the same location label, the subpopulation is further distinguished with a numeral. NSv3 is used as an internal reference in enrichment analysis. See Supplementary Table 17 for more detailed demographic descriptions of each subpopulation.
Regional variation in allele frequencies by functional annotation.
Enrichment of variants by allelic class in regional sub-populations of late settlement Finland (defined in Supplementary Table 17). Each bin represents the ratio of variants in the subpopulation compared to the reference subpopulation (NSv3), after down-sampling the frequency spectra of all populations to 200 chromosomes. Pink cells represent an enrichment (ratio >1), blue cells represent a depletion (ratio <1). Sample sizes and confidence intervals on each enrichment ratios, and their P-values, are presented in Supplementary Table 18. The results are consistent with multiple bottlenecks in late settlement Finland, particularly for populations in Lapland and Northern Ostrobothnia.
Supplementary Material
Supplementary Information is linked to the online version of the paper at www.nature.com/nature.
Authors: Eveliina Jakkula; Karola Rehnström; Teppo Varilo; Olli P H Pietiläinen; Tiina Paunio; Nancy L Pedersen; Ulf deFaire; Marjo-Riitta Järvelin; Juha Saharinen; Nelson Freimer; Samuli Ripatti; Shaun Purcell; Andrew Collins; Mark J Daly; Aarno Palotie; Leena Peltonen Journal: Am J Hum Genet Date: 2008-12 Impact factor: 11.025
Authors: Anne Polvi; Henna Linturi; Teppo Varilo; Anna-Kaisa Anttonen; Myles Byrne; Ivo F A C Fokkema; Henrikki Almusa; Anthony Metzidis; Kristiina Avela; Pertti Aula; Marjo Kestilä; Juha Muilu Journal: Hum Mutat Date: 2013-09-13 Impact factor: 4.878
Authors: Or Zuk; Stephen F Schaffner; Kaitlin Samocha; Ron Do; Eliana Hechter; Sekar Kathiresan; Mark J Daly; Benjamin M Neale; Shamil R Sunyaev; Eric S Lander Journal: Proc Natl Acad Sci U S A Date: 2014-01-17 Impact factor: 11.205
Authors: Teri A Manolio; Francis S Collins; Nancy J Cox; David B Goldstein; Lucia A Hindorff; David J Hunter; Mark I McCarthy; Erin M Ramos; Lon R Cardon; Aravinda Chakravarti; Judy H Cho; Alan E Guttmacher; Augustine Kong; Leonid Kruglyak; Elaine Mardis; Charles N Rotimi; Montgomery Slatkin; David Valle; Alice S Whittemore; Michael Boehnke; Andrew G Clark; Evan E Eichler; Greg Gibson; Jonathan L Haines; Trudy F C Mackay; Steven A McCarroll; Peter M Visscher Journal: Nature Date: 2009-10-08 Impact factor: 49.962
Authors: Alisa Manning; Heather M Highland; Jessica Gasser; Xueling Sim; Taru Tukiainen; Pierre Fontanillas; Niels Grarup; Manuel A Rivas; Anubha Mahajan; Adam E Locke; Pablo Cingolani; Tune H Pers; Ana Viñuela; Andrew A Brown; Ying Wu; Jason Flannick; Christian Fuchsberger; Eric R Gamazon; Kyle J Gaulton; Hae Kyung Im; Tanya M Teslovich; Thomas W Blackwell; Jette Bork-Jensen; Noël P Burtt; Yuhui Chen; Todd Green; Christopher Hartl; Hyun Min Kang; Ashish Kumar; Claes Ladenvall; Clement Ma; Loukas Moutsianas; Richard D Pearson; John R B Perry; N William Rayner; Neil R Robertson; Laura J Scott; Martijn van de Bunt; Johan G Eriksson; Antti Jula; Seppo Koskinen; Terho Lehtimäki; Aarno Palotie; Olli T Raitakari; Suzanne B R Jacobs; Jennifer Wessel; Audrey Y Chu; Robert A Scott; Mark O Goodarzi; Christine Blancher; Gemma Buck; David Buck; Peter S Chines; Stacey Gabriel; Anette P Gjesing; Christopher J Groves; Mette Hollensted; Jeroen R Huyghe; Anne U Jackson; Goo Jun; Johanne Marie Justesen; Massimo Mangino; Jacquelyn Murphy; Matt Neville; Robert Onofrio; Kerrin S Small; Heather M Stringham; Joseph Trakalo; Eric Banks; Jason Carey; Mauricio O Carneiro; Mark DePristo; Yossi Farjoun; Timothy Fennell; Jacqueline I Goldstein; George Grant; Martin Hrabé de Angelis; Jared Maguire; Benjamin M Neale; Ryan Poplin; Shaun Purcell; Thomas Schwarzmayr; Khalid Shakir; Joshua D Smith; Tim M Strom; Thomas Wieland; Jaana Lindstrom; Ivan Brandslund; Cramer Christensen; Gabriela L Surdulescu; Timo A Lakka; Alex S F Doney; Peter Nilsson; Nicholas J Wareham; Claudia Langenberg; Tibor V Varga; Paul W Franks; Olov Rolandsson; Anders H Rosengren; Vidya S Farook; Farook Thameem; Sobha Puppala; Satish Kumar; Donna M Lehman; Christopher P Jenkinson; Joanne E Curran; Daniel Esten Hale; Sharon P Fowler; Rector Arya; Ralph A DeFronzo; Hanna E Abboud; Ann-Christine Syvänen; Pamela J Hicks; Nicholette D Palmer; Maggie C Y Ng; Donald W Bowden; Barry I Freedman; Tõnu Esko; Reedik Mägi; Lili Milani; Evelin Mihailov; Andres Metspalu; Narisu Narisu; Leena Kinnunen; Lori L Bonnycastle; Amy Swift; Dorota Pasko; Andrew R Wood; João Fadista; Toni I Pollin; Nir Barzilai; Gil Atzmon; Benjamin Glaser; Barbara Thorand; Konstantin Strauch; Annette Peters; Michael Roden; Martina Müller-Nurasyid; Liming Liang; Jennifer Kriebel; Thomas Illig; Harald Grallert; Christian Gieger; Christa Meisinger; Lars Lannfelt; Solomon K Musani; Michael Griswold; Herman A Taylor; Gregory Wilson; Adolfo Correa; Heikki Oksa; William R Scott; Uzma Afzal; Sian-Tsung Tan; Marie Loh; John C Chambers; Jobanpreet Sehmi; Jaspal Singh Kooner; Benjamin Lehne; Yoon Shin Cho; Jong-Young Lee; Bok-Ghee Han; Annemari Käräjämäki; Qibin Qi; Lu Qi; Jinyan Huang; Frank B Hu; Olle Melander; Marju Orho-Melander; Jennifer E Below; David Aguilar; Tien Yin Wong; Jianjun Liu; Chiea-Chuen Khor; Kee Seng Chia; Wei Yen Lim; Ching-Yu Cheng; Edmund Chan; E Shyong Tai; Tin Aung; Allan Linneberg; Bo Isomaa; Thomas Meitinger; Tiinamaija Tuomi; Liisa Hakaste; Jasmina Kravic; Marit E Jørgensen; Torsten Lauritzen; Panos Deloukas; Kathleen E Stirrups; Katharine R Owen; Andrew J Farmer; Timothy M Frayling; Stephen P O'Rahilly; Mark Walker; Jonathan C Levy; Dylan Hodgkiss; Andrew T Hattersley; Teemu Kuulasmaa; Alena Stančáková; Inês Barroso; Dwaipayan Bharadwaj; Juliana Chan; Giriraj R Chandak; Mark J Daly; Peter J Donnelly; Shah B Ebrahim; Paul Elliott; Tasha Fingerlin; Philippe Froguel; Cheng Hu; Weiping Jia; Ronald C W Ma; Gilean McVean; Taesung Park; Dorairaj Prabhakaran; Manjinder Sandhu; James Scott; Rob Sladek; Nikhil Tandon; Yik Ying Teo; Eleftheria Zeggini; Richard M Watanabe; Heikki A Koistinen; Y Antero Kesaniemi; Matti Uusitupa; Timothy D Spector; Veikko Salomaa; Rainer Rauramaa; Colin N A Palmer; Inga Prokopenko; Andrew D Morris; Richard N Bergman; Francis S Collins; Lars Lind; Erik Ingelsson; Jaakko Tuomilehto; Fredrik Karpe; Leif Groop; Torben Jørgensen; Torben Hansen; Oluf Pedersen; Johanna Kuusisto; Gonçalo Abecasis; Graeme I Bell; John Blangero; Nancy J Cox; Ravindranath Duggirala; Mark Seielstad; James G Wilson; Josee Dupuis; Samuli Ripatti; Craig L Hanis; Jose C Florez; Karen L Mohlke; James B Meigs; Markku Laakso; Andrew P Morris; Michael Boehnke; David Altshuler; Mark I McCarthy; Anna L Gloyn; Cecilia M Lindgren Journal: Diabetes Date: 2017-03-24 Impact factor: 9.461
Authors: Yali Xue; Massimo Mezzavilla; Marc Haber; Shane McCarthy; Yuan Chen; Vagheesh Narasimhan; Arthur Gilly; Qasim Ayub; Vincenza Colonna; Lorraine Southam; Christopher Finan; Andrea Massaia; Himanshu Chheda; Priit Palta; Graham Ritchie; Jennifer Asimit; George Dedoussis; Paolo Gasparini; Aarno Palotie; Samuli Ripatti; Nicole Soranzo; Daniela Toniolo; James F Wilson; Richard Durbin; Chris Tyler-Smith; Eleftheria Zeggini Journal: Nat Commun Date: 2017-06-23 Impact factor: 14.919
Authors: Eirini Marouli; Mariaelisa Graff; Carolina Medina-Gomez; Ken Sin Lo; Andrew R Wood; Troels R Kjaer; Rebecca S Fine; Yingchang Lu; Claudia Schurmann; Heather M Highland; Sina Rüeger; Gudmar Thorleifsson; Anne E Justice; David Lamparter; Kathleen E Stirrups; Valérie Turcot; Kristin L Young; Thomas W Winkler; Tõnu Esko; Tugce Karaderi; Adam E Locke; Nicholas G D Masca; Maggie C Y Ng; Poorva Mudgal; Manuel A Rivas; Sailaja Vedantam; Anubha Mahajan; Xiuqing Guo; Goncalo Abecasis; Katja K Aben; Linda S Adair; Dewan S Alam; Eva Albrecht; Kristine H Allin; Matthew Allison; Philippe Amouyel; Emil V Appel; Dominique Arveiler; Folkert W Asselbergs; Paul L Auer; Beverley Balkau; Bernhard Banas; Lia E Bang; Marianne Benn; Sven Bergmann; Lawrence F Bielak; Matthias Blüher; Heiner Boeing; Eric Boerwinkle; Carsten A Böger; Lori L Bonnycastle; Jette Bork-Jensen; Michiel L Bots; Erwin P Bottinger; Donald W Bowden; Ivan Brandslund; Gerome Breen; Murray H Brilliant; Linda Broer; Amber A Burt; Adam S Butterworth; David J Carey; Mark J Caulfield; John C Chambers; Daniel I Chasman; Yii-Der Ida Chen; Rajiv Chowdhury; Cramer Christensen; Audrey Y Chu; Massimiliano Cocca; Francis S Collins; James P Cook; Janie Corley; Jordi Corominas Galbany; Amanda J Cox; Gabriel Cuellar-Partida; John Danesh; Gail Davies; Paul I W de Bakker; Gert J de Borst; Simon de Denus; Mark C H de Groot; Renée de Mutsert; Ian J Deary; George Dedoussis; Ellen W Demerath; Anneke I den Hollander; Joe G Dennis; Emanuele Di Angelantonio; Fotios Drenos; Mengmeng Du; Alison M Dunning; Douglas F Easton; Tapani Ebeling; Todd L Edwards; Patrick T Ellinor; Paul Elliott; Evangelos Evangelou; Aliki-Eleni Farmaki; Jessica D Faul; Mary F Feitosa; Shuang Feng; Ele Ferrannini; Marco M Ferrario; Jean Ferrieres; Jose C Florez; Ian Ford; Myriam Fornage; Paul W Franks; Ruth Frikke-Schmidt; Tessel E Galesloot; Wei Gan; Ilaria Gandin; Paolo Gasparini; Vilmantas Giedraitis; Ayush Giri; Giorgia Girotto; Scott D Gordon; Penny Gordon-Larsen; Mathias Gorski; Niels Grarup; Megan L Grove; Vilmundur Gudnason; Stefan Gustafsson; Torben Hansen; Kathleen Mullan Harris; Tamara B Harris; Andrew T Hattersley; Caroline Hayward; Liang He; Iris M Heid; Kauko Heikkilä; Øyvind Helgeland; Jussi Hernesniemi; Alex W Hewitt; Lynne J Hocking; Mette Hollensted; Oddgeir L Holmen; G Kees Hovingh; Joanna M M Howson; Carel B Hoyng; Paul L Huang; Kristian Hveem; M Arfan Ikram; Erik Ingelsson; Anne U Jackson; Jan-Håkan Jansson; Gail P Jarvik; Gorm B Jensen; Min A Jhun; Yucheng Jia; Xuejuan Jiang; Stefan Johansson; Marit E Jørgensen; Torben Jørgensen; Pekka Jousilahti; J Wouter Jukema; Bratati Kahali; René S Kahn; Mika Kähönen; Pia R Kamstrup; Stavroula Kanoni; Jaakko Kaprio; Maria Karaleftheri; Sharon L R Kardia; Fredrik Karpe; Frank Kee; Renske Keeman; Lambertus A Kiemeney; Hidetoshi Kitajima; Kirsten B Kluivers; Thomas Kocher; Pirjo Komulainen; Jukka Kontto; Jaspal S Kooner; Charles Kooperberg; Peter Kovacs; Jennifer Kriebel; Helena Kuivaniemi; Sébastien Küry; Johanna Kuusisto; Martina La Bianca; Markku Laakso; Timo A Lakka; Ethan M Lange; Leslie A Lange; Carl D Langefeld; Claudia Langenberg; Eric B Larson; I-Te Lee; Terho Lehtimäki; Cora E Lewis; Huaixing Li; Jin Li; Ruifang Li-Gao; Honghuang Lin; Li-An Lin; Xu Lin; Lars Lind; Jaana Lindström; Allan Linneberg; Yeheng Liu; Yongmei Liu; Artitaya Lophatananon; Jian'an Luan; Steven A Lubitz; Leo-Pekka Lyytikäinen; David A Mackey; Pamela A F Madden; Alisa K Manning; Satu Männistö; Gaëlle Marenne; Jonathan Marten; Nicholas G Martin; Angela L Mazul; Karina Meidtner; Andres Metspalu; Paul Mitchell; Karen L Mohlke; Dennis O Mook-Kanamori; Anna Morgan; Andrew D Morris; Andrew P Morris; Martina Müller-Nurasyid; Patricia B Munroe; Mike A Nalls; Matthias Nauck; Christopher P Nelson; Matt Neville; Sune F Nielsen; Kjell Nikus; Pål R Njølstad; Børge G Nordestgaard; Ioanna Ntalla; Jeffrey R O'Connel; Heikki Oksa; Loes M Olde Loohuis; Roel A Ophoff; Katharine R Owen; Chris J Packard; Sandosh Padmanabhan; Colin N A Palmer; Gerard Pasterkamp; Aniruddh P Patel; Alison Pattie; Oluf Pedersen; Peggy L Peissig; Gina M Peloso; Craig E Pennell; Markus Perola; James A Perry; John R B Perry; Thomas N Person; Ailith Pirie; Ozren Polasek; Danielle Posthuma; Olli T Raitakari; Asif Rasheed; Rainer Rauramaa; Dermot F Reilly; Alex P Reiner; Frida Renström; Paul M Ridker; John D Rioux; Neil Robertson; Antonietta Robino; Olov Rolandsson; Igor Rudan; Katherine S Ruth; Danish Saleheen; Veikko Salomaa; Nilesh J Samani; Kevin Sandow; Yadav Sapkota; Naveed Sattar; Marjanka K Schmidt; Pamela J Schreiner; Matthias B Schulze; Robert A Scott; Marcelo P Segura-Lepe; Svati Shah; Xueling Sim; Suthesh Sivapalaratnam; Kerrin S Small; Albert Vernon Smith; Jennifer A Smith; Lorraine Southam; Timothy D Spector; Elizabeth K Speliotes; John M Starr; Valgerdur Steinthorsdottir; Heather M Stringham; Michael Stumvoll; Praveen Surendran; Leen M 't Hart; Katherine E Tansey; Jean-Claude Tardif; Kent D Taylor; Alexander Teumer; Deborah J Thompson; Unnur Thorsteinsdottir; Betina H Thuesen; Anke Tönjes; Gerard Tromp; Stella Trompet; Emmanouil Tsafantakis; Jaakko Tuomilehto; Anne Tybjaerg-Hansen; Jonathan P Tyrer; Rudolf Uher; André G Uitterlinden; Sheila Ulivi; Sander W van der Laan; Andries R Van Der Leij; Cornelia M van Duijn; Natasja M van Schoor; Jessica van Setten; Anette Varbo; Tibor V Varga; Rohit Varma; Digna R Velez Edwards; Sita H Vermeulen; Henrik Vestergaard; Veronique Vitart; Thomas F Vogt; Diego Vozzi; Mark Walker; Feijie Wang; Carol A Wang; Shuai Wang; Yiqin Wang; Nicholas J Wareham; Helen R Warren; Jennifer Wessel; Sara M Willems; James G Wilson; Daniel R Witte; Michael O Woods; Ying Wu; Hanieh Yaghootkar; Jie Yao; Pang Yao; Laura M Yerges-Armstrong; Robin Young; Eleftheria Zeggini; Xiaowei Zhan; Weihua Zhang; Jing Hua Zhao; Wei Zhao; Wei Zhao; He Zheng; Wei Zhou; Jerome I Rotter; Michael Boehnke; Sekar Kathiresan; Mark I McCarthy; Cristen J Willer; Kari Stefansson; Ingrid B Borecki; Dajiang J Liu; Kari E North; Nancy L Heard-Costa; Tune H Pers; Cecilia M Lindgren; Claus Oxvig; Zoltán Kutalik; Fernando Rivadeneira; Ruth J F Loos; Timothy M Frayling; Joel N Hirschhorn; Panos Deloukas; Guillaume Lettre Journal: Nature Date: 2017-02-01 Impact factor: 49.962
Authors: Josep M Mercader; Christian Fuchsberger; Miriam S Udler; Anubha Mahajan; Jason Flannick; Jennifer Wessel; Tanya M Teslovich; Lizz Caulkins; Ryan Koesterer; Francisco Barajas-Olmos; Thomas W Blackwell; Eric Boerwinkle; Jennifer A Brody; Federico Centeno-Cruz; Ling Chen; Siying Chen; Cecilia Contreras-Cubas; Emilio Córdova; Adolfo Correa; Maria Cortes; Ralph A DeFronzo; Lawrence Dolan; Kimberly L Drews; Amanda Elliott; James S Floyd; Stacey Gabriel; Maria Eugenia Garay-Sevilla; Humberto García-Ortiz; Myron Gross; Sohee Han; Nancy L Heard-Costa; Anne U Jackson; Marit E Jørgensen; Hyun Min Kang; Megan Kelsey; Bong-Jo Kim; Heikki A Koistinen; Johanna Kuusisto; Joseph B Leader; Allan Linneberg; Ching-Ti Liu; Jianjun Liu; Valeriya Lyssenko; Alisa K Manning; Anthony Marcketta; Juan Manuel Malacara-Hernandez; Angélica Martínez-Hernández; Karen Matsuo; Elizabeth Mayer-Davis; Elvia Mendoza-Caamal; Karen L Mohlke; Alanna C Morrison; Anne Ndungu; Maggie C Y Ng; Colm O'Dushlaine; Anthony J Payne; Catherine Pihoker; Wendy S Post; Michael Preuss; Bruce M Psaty; Ramachandran S Vasan; N William Rayner; Alexander P Reiner; Cristina Revilla-Monsalve; Neil R Robertson; Nicola Santoro; Claudia Schurmann; Wing Yee So; Xavier Soberón; Heather M Stringham; Tim M Strom; Claudia H T Tam; Farook Thameem; Brian Tomlinson; Jason M Torres; Russell P Tracy; Rob M van Dam; Marijana Vujkovic; Shuai Wang; Ryan P Welch; Daniel R Witte; Tien-Yin Wong; Gil Atzmon; Nir Barzilai; John Blangero; Lori L Bonnycastle; Donald W Bowden; John C Chambers; Edmund Chan; Ching-Yu Cheng; Yoon Shin Cho; Francis S Collins; Paul S de Vries; Ravindranath Duggirala; Benjamin Glaser; Clicerio Gonzalez; Ma Elena Gonzalez; Leif Groop; Jaspal Singh Kooner; Soo Heon Kwak; Markku Laakso; Donna M Lehman; Peter Nilsson; Timothy D Spector; E Shyong Tai; Tiinamaija Tuomi; Jaakko Tuomilehto; James G Wilson; Carlos A Aguilar-Salinas; Erwin Bottinger; Brian Burke; David J Carey; Juliana C N Chan; Josée Dupuis; Philippe Frossard; Susan R Heckbert; Mi Yeong Hwang; Young Jin Kim; H Lester Kirchner; Jong-Young Lee; Juyoung Lee; Ruth J F Loos; Ronald C W Ma; Andrew D Morris; Christopher J O'Donnell; Colin N A Palmer; James Pankow; Kyong Soo Park; Asif Rasheed; Danish Saleheen; Xueling Sim; Kerrin S Small; Yik Ying Teo; Christopher Haiman; Craig L Hanis; Brian E Henderson; Lorena Orozco; Teresa Tusié-Luna; Frederick E Dewey; Aris Baras; Christian Gieger; Thomas Meitinger; Konstantin Strauch; Leslie Lange; Niels Grarup; Torben Hansen; Oluf Pedersen; Philip Zeitler; Dana Dabelea; Goncalo Abecasis; Graeme I Bell; Nancy J Cox; Mark Seielstad; Rob Sladek; James B Meigs; Steve S Rich; Jerome I Rotter; David Altshuler; Noël P Burtt; Laura J Scott; Andrew P Morris; Jose C Florez; Mark I McCarthy; Michael Boehnke Journal: Nature Date: 2019-05-22 Impact factor: 49.962
Authors: Hye In Kim; Bin Ye; Nehal Gosalia; Çiğdem Köroğlu; Robert L Hanson; Wen-Chi Hsueh; William C Knowler; Leslie J Baier; Clifton Bogardus; Alan R Shuldiner; Cristopher V Van Hout Journal: Am J Hum Genet Date: 2020-07-07 Impact factor: 11.025
Authors: Todd Lencz; Jin Yu; Raiyan Rashid Khan; Erin Flaherty; Shai Carmi; Max Lam; Danny Ben-Avraham; Nir Barzilai; Susan Bressman; Ariel Darvasi; Judy H Cho; Lorraine N Clark; Zeynep H Gümüş; Joseph Vijai; Robert J Klein; Steven Lipkin; Kenneth Offit; Harry Ostrer; Laurie J Ozelius; Inga Peter; Anil K Malhotra; Tom Maniatis; Gil Atzmon; Itsik Pe'er Journal: Neuron Date: 2021-03-22 Impact factor: 17.173
Authors: Mariaelisa Graff; Anne E Justice; Kristin L Young; Eirini Marouli; Xinruo Zhang; Rebecca S Fine; Elise Lim; Victoria Buchanan; Kristin Rand; Mary F Feitosa; Mary K Wojczynski; Lisa R Yanek; Yaming Shao; Rebecca Rohde; Adebowale A Adeyemo; Melinda C Aldrich; Matthew A Allison; Christine B Ambrosone; Stefan Ambs; Christopher Amos; Donna K Arnett; Larry Atwood; Elisa V Bandera; Traci Bartz; Diane M Becker; Sonja I Berndt; Leslie Bernstein; Lawrence F Bielak; William J Blot; Erwin P Bottinger; Donald W Bowden; Jonathan P Bradfield; Jennifer A Brody; Ulrich Broeckel; Gregory Burke; Brian E Cade; Qiuyin Cai; Neil Caporaso; Chris Carlson; John Carpten; Graham Casey; Stephen J Chanock; Guanjie Chen; Minhui Chen; Yii-Der I Chen; Wei-Min Chen; Alessandra Chesi; Charleston W K Chiang; Lisa Chu; Gerry A Coetzee; David V Conti; Richard S Cooper; Mary Cushman; Ellen Demerath; Sandra L Deming; Latchezar Dimitrov; Jingzhong Ding; W Ryan Diver; Qing Duan; Michele K Evans; Adeyinka G Falusi; Jessica D Faul; Myriam Fornage; Caroline Fox; Barry I Freedman; Melissa Garcia; Elizabeth M Gillanders; Phyllis Goodman; Omri Gottesman; Struan F A Grant; Xiuqing Guo; Hakon Hakonarson; Talin Haritunians; Tamara B Harris; Curtis C Harris; Brian E Henderson; Anselm Hennis; Dena G Hernandez; Joel N Hirschhorn; Lorna Haughton McNeill; Timothy D Howard; Barbara Howard; Ann W Hsing; Yu-Han H Hsu; Jennifer J Hu; Chad D Huff; Dezheng Huo; Sue A Ingles; Marguerite R Irvin; Esther M John; Karen C Johnson; Joanne M Jordan; Edmond K Kabagambe; Sun J Kang; Sharon L Kardia; Brendan J Keating; Rick A Kittles; Eric A Klein; Suzanne Kolb; Laurence N Kolonel; Charles Kooperberg; Lewis Kuller; Abdullah Kutlar; Leslie Lange; Carl D Langefeld; Loic Le Marchand; Hampton Leonard; Guillaume Lettre; Albert M Levin; Yun Li; Jin Li; Yongmei Liu; Youfang Liu; Simin Liu; Kurt Lohman; Vaneet Lotay; Yingchang Lu; William Maixner; JoAnn E Manson; Barbara McKnight; Yan Meng; Keri L Monda; Kris Monroe; Jason H Moore; Thomas H Mosley; Poorva Mudgal; Adam B Murphy; Rajiv Nadukuru; Mike A Nalls; Katherine L Nathanson; Uma Nayak; Amidou N'Diaye; Barbara Nemesure; Christine Neslund-Dudas; Marian L Neuhouser; Sarah Nyante; Heather Ochs-Balcom; Temidayo O Ogundiran; Adesola Ogunniyi; Oladosu Ojengbede; Hayrettin Okut; Olufunmilayo I Olopade; Andrew Olshan; Badri Padhukasahasram; Julie Palmer; Cameron D Palmer; Nicholette D Palmer; George Papanicolaou; Sanjay R Patel; Curtis A Pettaway; Patricia A Peyser; Michael F Press; D C Rao; Laura J Rasmussen-Torvik; Susan Redline; Alex P Reiner; Suhn K Rhie; Jorge L Rodriguez-Gil; Charles N Rotimi; Jerome I Rotter; Edward A Ruiz-Narvaez; Benjamin A Rybicki; Babatunde Salako; Michele M Sale; Maureen Sanderson; Eric Schadt; Pamela J Schreiner; Claudia Schurmann; Ann G Schwartz; Daniel A Shriner; Lisa B Signorello; Andrew B Singleton; David S Siscovick; Jennifer A Smith; Shad Smith; Elizabeth Speliotes; Margaret Spitz; Janet L Stanford; Victoria L Stevens; Alex Stram; Sara S Strom; Lara Sucheston; Yan V Sun; Salman M Tajuddin; Herman Taylor; Kira Taylor; Bamidele O Tayo; Michael J Thun; Margaret A Tucker; Dhananjay Vaidya; David J Van Den Berg; Sailaja Vedantam; Mara Vitolins; Zhaoming Wang; Erin B Ware; Sylvia Wassertheil-Smoller; David R Weir; John K Wiencke; Scott M Williams; L Keoki Williams; James G Wilson; John S Witte; Margaret Wrensch; Xifeng Wu; Jie Yao; Neil Zakai; Krista Zanetti; Babette S Zemel; Wei Zhao; Jing Hua Zhao; Wei Zheng; Degui Zhi; Jie Zhou; Xiaofeng Zhu; Regina G Ziegler; Joe Zmuda; Alan B Zonderman; Bruce M Psaty; Ingrid B Borecki; L Adrienne Cupples; Ching-Ti Liu; Christopher A Haiman; Ruth Loos; Maggie C Y Ng; Kari E North Journal: Am J Hum Genet Date: 2021-03-12 Impact factor: 11.025
Authors: Aaro Salosensaari; Ville Laitinen; Leo Lahti; Teemu Niiranen; Aki S Havulinna; Guillaume Meric; Susan Cheng; Markus Perola; Liisa Valsta; Georg Alfthan; Michael Inouye; Jeramie D Watrous; Tao Long; Rodolfo A Salido; Karenina Sanders; Caitriona Brennan; Gregory C Humphrey; Jon G Sanders; Mohit Jain; Pekka Jousilahti; Veikko Salomaa; Rob Knight Journal: Nat Commun Date: 2021-05-11 Impact factor: 14.919
Authors: Lei Chen; Haley J Abel; Indraniel Das; David E Larson; Liron Ganel; Krishna L Kanchi; Allison A Regier; Erica P Young; Chul Joo Kang; Alexandra J Scott; Colby Chiang; Xinxin Wang; Shuangjia Lu; Ryan Christ; Susan K Service; Charleston W K Chiang; Aki S Havulinna; Johanna Kuusisto; Michael Boehnke; Markku Laakso; Aarno Palotie; Samuli Ripatti; Nelson B Freimer; Adam E Locke; Nathan O Stitziel; Ira M Hall Journal: Am J Hum Genet Date: 2021-04-01 Impact factor: 11.025
Authors: Liron Ganel; Lei Chen; Ryan Christ; Jagadish Vangipurapu; Erica Young; Indraniel Das; Krishna Kanchi; David Larson; Allison Regier; Haley Abel; Chul Joo Kang; Alexandra Scott; Aki Havulinna; Charleston W K Chiang; Susan Service; Nelson Freimer; Aarno Palotie; Samuli Ripatti; Johanna Kuusisto; Michael Boehnke; Markku Laakso; Adam Locke; Nathan O Stitziel; Ira M Hall Journal: Hum Genomics Date: 2021-06-07 Impact factor: 6.481