| Literature DB >> 28548082 |
Lorraine Southam1,2, Arthur Gilly1, Dániel Süveges1, Aliki-Eleni Farmaki3, Jeremy Schwartzentruber1, Ioanna Tachmazidou1, Angela Matchan1, Nigel W Rayner1,2,4, Emmanouil Tsafantakis5, Maria Karaleftheri6, Yali Xue1, George Dedoussis3, Eleftheria Zeggini1.
Abstract
Next-generation association studies can be empowered by sequence-based imputation and by studying founder populations. Here we report ∼9.5 million variants from whole-genome sequencing (WGS) of a Cretan-isolated population, and show enrichment of rare and low-frequency variants with predicted functional consequences. We use a WGS-based imputation approach utilizing 10,422 reference haplotypes to perform genome-wide association analyses and observe 17 genome-wide significant, independent signals, including replicating evidence for association at eight novel low-frequency variant signals. Two novel cardiometabolic associations are at lead variants unique to the founder population sequences: chr16:70790626 (high-density lipoprotein levels beta -1.71 (SE 0.25), P=1.57 × 10-11, effect allele frequency (EAF) 0.006); and rs145556679 (triglycerides levels beta -1.13 (SE 0.17), P=2.53 × 10-11, EAF 0.013). Our findings add empirical support to the contribution of low-frequency variants in complex traits, demonstrate the advantage of including population-specific sequences in imputation panels and exemplify the power gains afforded by population isolates.Entities:
Mesh:
Year: 2017 PMID: 28548082 PMCID: PMC5458552 DOI: 10.1038/ncomms15606
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Figure 1Flowchart of study design.
The HELIC cohorts were prephased, imputed and analysed separately by cohort and array, and finally meta-analysed. The variant numbers reported here are total regardless of MAF. Imputed variants are for chromosomes 1–22.
Figure 2Variant sharing and functional annotation.
(a) SNP density per kbp and percentage of total per functional class, based on 9,554,503 variants identified in the HELIC MANOLIS 4 × WGS data of 249 samples (MAC≥2). Error bars indicate standard error of the mean; the dashed red line indicates average density genome-wide. (b) Variant overlap between 498 HELIC MANOLIS, 7,582 UK10K and 2,184 1000 Genomes Project reference panel haplotypes, by MAF category. Numerical values are given in Supplementary Tables 1 and 2.
Figure 3Functional enrichment of variants private to the MANOLIS sequences when compared to variants shared with UK10K and/or 1000 Genomes.
Enrichment and depletion of functional classes of variants private to the MANOLIS cohort can be observed in the rare and low-frequency (MAF≤5%), while no significant enrichment is detected in common-frequency variants in any functional class. Numerical values are listed in Supplementary Table 4.
Figure 4False-positive rate and meta-analysis power in the presence of sample overlap using METACARPA.
(a) Empirical false-positive rate as a function of sample overlap in 1,000 repeats of a meta-analysis of two studies including 2,000 samples each, at a significance threshold of 5 × 10−8. (b) Empirical power of the four tests implemented in METACARPA as a function of sample overlap in the same simulation setting. Power is calculated as the discovery rate of a SNP explaining 1% of a standard normal phenotype under the same simulation scenario (for example, a MAF of 1% and an effect size of 0.705, or a MAF of 20% and an effect size of 0.176). (c) Compared accuracy of Digby's estimate of tetrachoric correlation and Pearson's correlation for a true (dashed line) 25% overlap under a polygenic burden, with 10,000 SNPs affecting a quantitative trait with 20% heritability. Estimates of correlation for both methods are calculated over 300 genome-wide simulations. The black line indicates the median, shaded rectangles represent the interquintile ranges.
Summary statistics at established loci.
| rs7412 MANOLIS & Pomak | LDL | 9:45412079 (T/C) | Missense | 0.079 | −0.419 (0.047) | 2.64 × 10−19 | 3168 | rs7412 | 22286219 | NA | |
| TC | p.Arg176Cys | 0.079 | −0.27 (0.047) | 1.05 × 10−8 | 3170 | ||||||
| rs7553007 MANOLIS & Pomak | CRP | 1:159698549 (A/G) | Intergenic | 0.327 | −0.202 (0.029) | 6.80 × 10−12 | 2689 | rs7553007 | 19567438 | NA | |
| rs964184 MANOLIS & Pomak | VLDL | 11:116648917 (G/C) | 3′ UTR | 0.163 | 0.242 (0.035) | 3.68 × 10−12 | 3170 | rs964184 | 24097068 | NA | |
| TG | 0.163 | 0.236 (0.035) | 1.52 × 10−11 | 3164 | |||||||
| rs76353203 MANOLIS | TG | 11:116701353 (T/C) | Stop-gain | 0.022 | −1.073 (0.129) | 6.88 × 10−17 | 1461 | rs76353203 | 24343240 | NA | |
| HDL | p.Arg19Ter | 0.022 | 0.919 (0.13) | 1.78 × 10−12 | 1465 | ||||||
| rs150641967 MANOLIS & Pomak | LDL | 19:19370340 (T/TGACA) | Intronic | 0.075 | −0.326 (0.049) | 3.49 × 10−11 | 3168 | rs10401969 | 24097068 | 9.34 × 10−1 | |
| TC | 0.074 | −0.322 (0.046) | 8.29 × 10−11 | 3170 | 8.71 × 10−1 | ||||||
| TG | 0.074 | −0.278 (0.05) | 2.49 × 10−8 | 3164 | 3.94 × 10−1 | ||||||
| VLDL | 0.075 | −0.282 (0.05) | 1.48 × 10−8 | 3170 | 3.51 × 10−1 | ||||||
| rs35237252 MANOLIS & Pomak | HDL | 8:19870271 (A/C) | Regulatory region | 0.277 | 0.183 (0.029) | 4.04 × 10−10 | 3172 | rs2083637 | 19060911 | 1.39 × 10−1 | |
| rs200751500 MANOLIS & Pomak | HDL | 16:57001274 (A/AC) | Intronic | 0.33 | 0.294 (0.028) | 4.02 × 10−25 | 3172 | rs1532624 | 19060911 | 1.18 × 10−4 | |
| rs1331309 MANOLIS & Pomak | MCH | 6:135406178 (G/T) | Intronic | 0.228 | 0.201 (0.033) | 1.90 × 10−9 | 2829 | rs7775698 | 20139978 | 3.59 × 10−1 | |
| rs9804550 Pomak | WBC | 11:5186093 (T/C) | Intronic | 0.051 | 0.52 (0.081) | 1.10 × 10−10 | 1673 | rs7116019 | 25373335 | 5.33 × 10−6 | |
| MCH | 0.053 | −0.627 (0.079) | 2.19 × 10−15 | 1647 | 1.43 × 10−2 | ||||||
| MCHC | 0.054 | 0.894 (0.075) | 8.46 × 10−33 | 1669 | 1.46 × 10−4 | ||||||
| MCV | 0.052 | −1.071 (0.076) | 1.57 × 10−45 | 1658 | 2.71 × 10−5 | ||||||
| RBC | 0.054 | 0.473 (0.077) | 8.58 × 10−10 | 1718 | 3.56 × 10−2 |
Lead variants for validated, previously-reported association signals reaching P<5.00 × 10−8. Cohorts, cohorts from which the signal arose; Chr:pos, represents the chromosome & position in GRCh37/hg19 coordinates; Variant consequence, taken from Ensembl (http://www.ensembl.org) the Human Genome Variation Society variant nomenclature (http://www.HGVS.org/varnomen) are provided for exonic variants. The other abbreviations are: EA, effect allele; NEA, non-effect allele; EAF, effect allele frequency; P, the Wald test P-value from the association analysis using METACARPA; N, sample size; Reported variant, RS-id of the reported signal; Reported genes, the gene(s) in which the signal was reported; reported PMID, PubMed ID for the reported GWAS signal; Conditional P, Wald test P from the association analysis using METACARPA of the variant after conditioning on the reported variant, confirming the signals are conditionally dependent; NA, indicates that conditional analysis is not applicable since the variant is the same as the reported variant; LDL, low-density lipoprotein cholesterol; TC, total cholesterol; CRP, C-reactive protein; VLDL, very low-density lipoprotein cholesterol; TG, triglycerides; HDL, high-density lipoprotein cholesterol; MCH, mean corpuscular haemoglobin; WBC, white blood cells; MCHC, mean corpuscular haemoglobin concentration; MCV, mean corpuscular volume; RBC, red blood cells.
Summary of novel association signals.
| chr16:70790626 MANOLIS | HDL | 16:70790626 (T/C) | MANOLIS CoreExome | 0.003 | −1.885 (0.994) | 5.76 × 10−2 | 1.26 (210) | 0.006 | −1.713 (0.254) | 1.57 × 10−11 | 20 (1476) | |
| MANOLIS OmniExome | 0.007 | −1.702 (0.263) | 1.81 × 10−10 | 17.6 (1255) | ||||||||
| rs145556679 MANOLIS | TG | 11:117643264 (C/G) | MANOLIS CoreExome | 0.005 | −1.293 (0.729) | 7.85 × 10−2 | 2.09 (209) | 0.013 | −1.134 (0.17) | 2.53 × 10−11 | 49 (1476) | |
| MANOLIS OmniExome | 0.014 | −1.125 (0.175) | 1.70 × 10−10 | 35.1 (1252) | ||||||||
| VLDL | MANOLIS CoreExome | 0.005 | −1.365 (0.727) | 6.21 × 10−2 | 2.1 (210) | 0.013 | −1.131 (0.17) | 2.90 × 10−11 | ||||
| MANOLIS OmniExome | 0.014 | −1.118 (0.175) | 2.29 × 10−10 | 35.1 (1253) | ||||||||
| rs140087759 MANOLIS | WHR | 5:28292892 (T/C) | MANOLIS CoreExome | 0.015 | 1.676 (0.411) | 5.92 × 10−5 | 6.12 (204) | 0.01 | 1.189 (0.209) | 1.35 × 10−8 | 31 (1476) | |
| MANOLIS OmniExome | 0.009 | 1.02 (0.243) | 2.90 × 10−5 | 18.8 (1047) | ||||||||
| rs13382259 | DBP | 2:113934176 (T/A) | Pomak CoreExome | 0.047 | 0.509 (0.126) | 6.98 × 10−5 | 60.3 (641) | 0.043 | 0.554 (0.1) | 3.18 × 10−8 | 172 (1737) | |
| Pomak OmniExome | 0.039 | 0.629 (0.164) | 1.36 × 10−4 | 43 (551) | ||||||||
| rs6131100 | FGBMIadj | 20:10434530 (A/T) | Pomak CoreExome | 0.038 | −0.573 (0.16) | 3.62 × 10−4 | 43.2 (569) | 0.037 | −0.79 (0.139) | 1.21 × 10−8 | 135 (1737) | |
| Pomak OmniExome | 0.035 | −1.454 (0.279) | 7.12 × 10−7 | 12.2 (174) | ||||||||
| rs79748197 Pomak | WBC | 2:19430105 (G/A) | Pomak CoreExome | 0.004 | −1.242 (0.403) | 2.12 × 10−3 | 5.8 (725) | 0.008 | −1.156 (0.209) | 3.00 × 10−8 | 31 (1737) | |
| Pomak OmniExome | 0.004 | −1.125 (0.243) | 4.14 × 10−6 | 20.9 (948) | ||||||||
| rs557129696 Pomak | HGB | 11:5328683 (G/T) | Pomak CoreExome | 0.002 | −1.95 (0.606) | 1.36 × 10−3 | 2.87 (717) | 0.004 | −2.027 (0.308) | 4.83 × 10−11 | 13 (1737) | |
| Pomak OmniExome | 0.005 | −2.054 (0.358) | 1.30 × 10−8 | 9.45 (945) | ||||||||
| rs112037309 | Weight | 4:106617136 (A/G) | MANOLIS | 0.075 | 0.295 (0.078) | 1.43 × 10−4 | 189.8 (1258) | 0.075 | 0.287 (0.052) | 2.70 × 10−8 | 485 (3213) | |
| Pomak | 0.075 | 0.28 (0.07) | 5.96 × 10−5 | 250.8 (1672) | ||||||||
All variants are intronic with the exception of rs140087759 which is intergenic, variant consequences are taken from Ensembl (http://www.ensembl.org). For the internal replication the software used was GEMMA with the exception of rs112037309 in which METACARPA was used. Cohorts, cohorts from which the signal arose. Chr:pos, represents the chromosome and position in GRCh37/hg19 coordinates; EA, effect allele; NEA, non-effect allele; EAF, effect allele frequency; P-value, the likelihood ratio test P-value from GEMMA or Wald test P-value from METACARPA; MAC, minor allele count for samples in the analysis; Overall MAC, minor allele count for all samples in the cohorts from which the signal arose, established using the rounded imputed allele dosages from SNPTEST (https://mathgen.stats.ox.ac.uk/genetics_software/snptest/snptest.html); N, sample size; HDL, high-density lipoprotein cholesterol; DBP, diastolic blood pressure; TG, triglycerides; VLDL, very low-density lipoprotein cholesterol; FGBMIadjusted, fasting glucose adjusted for body mass index; HGB, haemoglobin; WBC, white blood cells; WHR, waist-to-hip ratio.
*At least one proxy is present in the International HapMap project data (http://hapmap.ncbi.nlm.nih.gov). Proxies were determined using LD (r2>0.8 in the cohorts used for the meta-analysis) for each novel variant. If a proxy was in HapMap it also had high LD (r2>0.9) with the variant in the 1000 Genomes Project CEU population3. LocusZoom was used to create the regional plots (http://csg.sph.umich.edu/locuszoom/).
Figure 5Association results for chr16:70790626 and rs145556679 and lipid levels.
(a) Heterozygotes for chr16:70790626 exhibit significantly lower HDL levels than homozygotes (Wald test METACARPA P=1.57 × 10−11). (b) Heterozygotes for rs145556679 exhibit significantly lower TG (Wald test METACARPA P=2.53 × 10−11) and VLDL (Wald test METACARPA P=2.90 × 10−11) levels than homozygotes. (c) Regional association plot for chr16:70790626. (d) To determine if the signals are detected without MANOLIS sequences in the reference panel, we conducted imputation using a combined UK10K+1000 Genomes reference panel; the regional plot shows that the chr16:70790626 signal is captured with a different lead variant and a decrease in significance. (e) Regional association plot for rs145556679. (f) Regional association plot for rs145556679 using a combined UK10K+1000 Genomes reference panel; the same signal is captured with a different lead variant and a decrease in association strength. LocusZoom was used to create the regional plots (http://csg.sph.umich.edu/locuszoom/).