| Literature DB >> 31992710 |
Elizabeth T Cirulli1, Simon White2, Robert W Read3,4, Gai Elhanan3,4, William J Metcalf3,4, Francisco Tanudjaja2, Donna M Fath2, Efren Sandoval2, Magnus Isaksson2, Karen A Schlauch3,4, Joseph J Grzymski3,4, James T Lu2, Nicole L Washington2.
Abstract
Understanding the impact of rare variants is essential to understanding human health. We analyze rare (MAF < 0.1%) variants against 4264 phenotypes in 49,960 exome-sequenced individuals from the UK Biobank and 1934 phenotypes (1821 overlapping with UK Biobank) in 21,866 members of the Healthy Nevada Project (HNP) cohort who underwent Exome + sequencing at Helix. After using our rare-variant-tailored methodology to reduce test statistic inflation, we identify 64 statistically significant gene-based associations in our meta-analysis of the two cohorts and 37 for phenotypes available in only one cohort. Singletons make significant contributions to our results, and the vast majority of the associations could not have been identified with a genotyping chip. Our results are available for interactive browsing in a webapp (https://ukb.research.helix.com). This comprehensive analysis illustrates the biological value of large, deeply phenotyped cohorts of unselected populations coupled with NGS data.Entities:
Mesh:
Year: 2020 PMID: 31992710 PMCID: PMC6987107 DOI: 10.1038/s41467-020-14288-y
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Study and cohort information.
| UK Biobank (UKB) | Healthy Nevada Project (HNP) | |
|---|---|---|
| N individuals: total/European ancestry | 49,960/40,468 | 21,866/17,238 |
| N phenotypes: binary/quantitative | 3014/1250 | 1784/150 |
| N phenotypes unique to cohort: binary/quantitative | 1240/1203 | 10/103 |
| Median N cases for binary traits [range] | All: 173 [5:46,192] Eur: 139 [1:37,983] | All: 153 [5:11,779] Eur: 126 [2:9, 426] |
| Median N phenotyped for quantitative traits [range] | All: 10,735 [635:49,904] Eur: 9287 [516:40,428] | All: 2339 [343:19,698] Eur: 1924 [290:15,542] |
Fig. 1Gene-based collapsing analysis.
a First, variants in each gene are identified by sequencing. b Variants that are predicted to be damaging—those that are rare and annotated as likely to affect the functionality of the gene, such as coding variants—are then selected for analysis. c Finally, the number of cases with a qualifying variant in each gene is compared with the number of controls with a qualifying variant, producing one statistical result per gene instead of one per variant.
Fig. 2Histogram of number of qualifying variants per gene in European UKB cohort.
a Number of qualifying coding variants per gene. Eleven genes with >500 variants were excluded from plot. The median of variants per gene is 34 (range [1:2833]). b Number of qualifying coding variants per coding nucleotide of each gene. Sixteen genes with values >0.2 were excluded from the plot. The median of variants per nucleotide is 0.027 (range [0.0001:0.991]). c Number of qualifying loss of function (LoF) variants per gene. Six genes with >50 variants were excluded from plot. The median of variants per gene is six (range [1:178]). d Number of qualifying LoF variants per coding nucleotide of each gene. Nine genes with values >0.05 were excluded from the plot. The median of variants per nucleotide is 0.005 (range [9.5 × 10−5:0.25]). Plots for all ancestries and HNP cohort can be found in Supplementary Fig. 1.
Fig. 3Overlaid QQ plots for the coding model with the phenotype atrial fibrillation.
This phenotype has a 1:22 case:control ratio. Shown are the results for a linear mixed model (LMM) meta-analysis of all European ancestry individuals with no minimum number of variant carriers required (black), with at least ten case carriers observed (red), and with at least ten case carriers expected in the case group based on the overall frequency (cyan), as well as a Fisher’s exact test (FET) of unrelated European ancestry individuals and all genes included (blue). The second to last condition is the requirement we set for our main analysis results. The one significant association is TTN, known from previous studies to be involved in phenotypes related to atrial fibrillation[28]. This association is significant (meta-analysis p < 3.4 × 10−10) in the LMM analysis, but it is difficult to distinguish from test statistic inflation without using the 10 expected case carriers cutoff (cyan). There is no inflation in the Fisher’s exact test of unrelated individuals, but this association is not significant in that analysis.
Statistically significant associations from the European ancestry analysis.
| Method | Lead gene | Lead phenotype | Lead model | UKB carrier | UKB | HNP carrier | HNP | Meta |
|---|---|---|---|---|---|---|---|---|
| LMM | Alkaline phosphatasea | Coding | 257 (0.68%) | 2.4 × 10−186 | 68 (0.65%) | 2.5 × 10−39 | 5.3 × 10−223 | |
| LMM | Uratea | Coding | 314 (0.83%) | 3.3 × 10−108 | 8 (0.46%) | 2.1 × 10−4 | 5.5 × 10−111 | |
| LMM | Aspartate aminotransferase | Coding | 113 (0.3%) | 5.4 × 10−58 | 28 (0.27%) | 2.2 × 10−13 | 1.6 × 10−69 | |
| LMM | HDL cholesterola | Coding | 449 (1.26%) | 2.9 × 10−35 | 88 (0.98%) | 2.3 × 10−10 | 4.9 × 10−44 | |
| LMM | LDL directa | LoF | 96 (0.25%) | 8.9 × 10−31 | 8 (0.09%) | 9.7 × 10−11 | 8.5 × 10−40 | |
| LMM | Alanine aminotransferase | Coding | 126 (0.33%) | 1.9 × 10−28 | 40 (0.39%) | 2.4 × 10−6 | 4.0 × 10−33 | |
| FET | Thalassaemia | LoF | 1 (25%) /0 (0%)b | 1.2 × 10−4 | 8 (47.06%)/5 (0.03%)b | 1.7 × 10−21 | 1.2 × 10−24 | |
| LMM | Platelet counta | Coding | 233 (0.59%) | 1.2 × 10−15 | 71 (0.66%) | 2.9 × 10−7 | 2.8 × 10−21 | |
| FET | D45 Polycythaemia vera | Coding | 13 (36.11%)/209 (0.61%)b | 1.7 × 10−19 | 3 (10.34%)/77 (0.53%)b | 5.4 × 10−4 | 1.7 × 10−19 | |
| LMM | Mean corpuscular haemoglobin | LoF | 27 (0.07%) | 5.0 × 10−15 | 4 (0.04%) | 8.9 × 10−4 | 2.3 × 10−17 | |
| LMM | Triglycerides | LoF | 42 (0.11%) | 2.0 × 10−12 | 10 (0.11%) | 1.1 × 10−2 | 1.0 × 10−13 | |
| LMM | Mean platelet volumea | Coding | 85 (0.22%) | 5.2 × 10−11 | 28 (0.27%) | 1.5 × 10−3 | 3.1 × 10−13 | |
| LMM | Cholesterol | Coding | 175 (0.46%) | 3.2 × 10−10 | 67 (0.62%) | 6.8 × 10−4 | 8.8 × 10−13 | |
| LMM | Glycated haemoglobin | Coding | 58 (0.15%) | 3.0 × 10−12 | 9 (0.17%) | 8.4 × 10−2 | 9.3 × 10−13 | |
| LMM | I48 Atrial fibrillation and flutter | LoF | 41 (2.38%)/311 (0.8%)b | 1.1 × 10−11 | 12 (1.48%)/132 (0.8%)b | 2.8 × 10−2 | 7.6 × 10−12 | |
| FET | R31 Unspecified haematuria | LoF | 15 (1.2%)/51 (0.15%)b | 1.1 × 10−8 | 5 (0.47%)/7 (0.05%)b | 1.0 × 10−3 | 1.2 × 10−10 | |
| FET | Z40.0 Prophylactic surgery for malignant neoplasm risk-factors | LoF | 7 (12.28%)/154 (0.45%)b | 1.1 × 10−8 | 2 (9.52%)/56 (0.38%)b | 3.1 × 10−3 | 1.4 × 10−10 | |
| LMM | Cystatin C | Coding | 56 (0.15%) | 9.6 × 10−52 | ||||
| LMM | SHBG | Coding | 149 (0.42%) | 1.1 × 10−33 | ||||
| LMM | Non-cancer illness code, self-reported: high cholesterol | Coding | 79 (1.54%)/173 (0.49%)b | 1.9 × 10−18 | ||||
| LMM | Hair colour: Blonde | Coding | 54 (1.16%)/112 (0.31%)b | 1.2 × 10−17 | ||||
| LMM | Median T2star in putamen (right) | LoF | 38 (0.4%) | 2.1 × 10−14 | ||||
| LMM | Mean platelet volumec | Coding | 33 (0.08%) | 3.0 × 10−12 | ||||
| LMM | 6 mm weak meridian (right) | LoF | 30 (0.11%) | 3.1 × 10−11 | ||||
| LMM | C-reactive protein | Coding | 66 (0.17%) | 6.6 × 10−11 | ||||
| LMM | Hair colour: Red | Coding | 31 (1.75%)/222 (0.57%)b | 2.2 × 10−10 |
D45, I48 and R31 refer to ICD-10-CM diagnosis codes, Median T2star is a measurement from a brain MRI, 6 mm weak meridian (right) is from keratometry of the right eye. When multiple phenotypes and/or models (coding, LoF) were significantly associated with a gene, only the lead phenotype and model are shown. When multiple genes were associated with a trait, only the top gene is shown. All results can be found in Supplementary Data 2
LMM linear mixed model, FET Fisher’s exact test, LoF loss of function, HDL high density lipoprotein, LDL low density lipoprotein, SHBG sex hormone binding globulin
aAdditional genes associated with alkaline phosphatase include GPLD1, ASGR1, and ABCB11; with HDL cholesterol include LCAT, CETP, and SCARB1; with LDL direct include PCSK9; with platelet count include MPL and ITGA2B; with urate include SLC2A9; and with mean platelet volume include IQGAP2, GFI1B, and GP1BA
bFor binary traits, the information shown is case n (%)/ctrl n (%)
cAlthough this phenotype was included in the meta-analysis, this particular gene did not have carriers in the HNP cohort
Statistically significant associations from the mixed ancestry analysis.
| Gene | Lead model | Lead phenotype | UKB carrier | UKB | HNP carrier | HNP | Meta | Eur carrier n (%)a,b | Eur |
|---|---|---|---|---|---|---|---|---|---|
| Coding | Total bilirubin | 90 (0.19%) | 3.4 × 10−14 | 19 (0.15%) | 1.5 × 10−2 | 4.3 × 10−15 | 77 (0.16%) | 5.5 × 10−9 | |
| Coding | Albumin | 72 (0.16%) | 4.9 × 10−12 | 28 (0.22%) | 2.4 × 10−2 | 8.5 × 10−13 | 72 (0.16%) | 9.2 × 10−8 | |
| Coding | Mean corpuscular haemoglobin | 389 (0.8%) | 1.5 × 10−11 | 105 (0.78%) | 5.0 × 10−2 | 5.8 × 10−12 | 369 (0.73%) | 6.3 × 10−9 | |
| Coding | Total bilirubin | 531 (1.14%) | 6.7 × 10−10 | 135 (1.06%) | 1.6 × 10−2 | 4.5 × 10−11 | 574 (1.18%) | 1.4 × 10−8 | |
| Coding | Hair colour: Blonde | 32 (0.61%)/65 (0.15%)a | 1.5 × 10−14 | 31 (0.66%)/46 (0.13%)a | 2.5 × 10−15c | ||||
| Coding | Hair colour: Blonde | 65 (1.24%)/231 (0.52%)a | 3.4 × 10−12 | 55 (1.18%)/180 (0.5%)a | 6.6 × 10−9 | ||||
| Coding | Red blood cell distribution width | 341 (0.7%) | 1.9 × 10−10 | 279 (0.71%) | 3.4 × 10−10 |
All results shown are from the LMM with all ethnicities. When multiple phenotypes and/or models (coding, LoF) were significantly associated with a gene, only the lead phenotype and model are shown. All results can be found in Supplementary Data 2
aFor binary traits, the information shown is case n (%)/ctrl n (%)
bEur carrier and Eur p value columns: For the phenotypes measured in both cohorts, the European meta-analysis values are shown. For the phenotypes measured only in UKB (blank for HNP), the UKB Eur values are shown
cWhile OCA2 was significantly associated with hair colour in the European ancestry subset, that subset had only nine expected case carriers, and so it failed our screening. In the Fisher’s exact test in unrelated individuals with no carrier cutoff, the p value is 1.3 × 10−8
Fig. 4Distribution of effects of rare variants in select genes in the UKB cohort.
a SLC2A9 protein and urate levels. The legend shows the gene, its associated phenotype, and the effect size (beta). The effect size is computed from the gene-based collapsing model, in which individuals were coded as either having or not having a qualifying variant. A positive value indicates that variant carriers have, on average, higher values for the phenotype, while a negative value indicates that variant carriers have lower values. The amino acid positions are shown on the x-axis, with the PFAM domain highlighted. The y-axis displays the beta of each individual variant, with negative values shown below and positive values above the horizontal axis. Variants are indicated according to their consequence as shown and labelled according to their amino acid change or splice site variation. The number inside the circle is the number of people carrying that variant. Darker lines connecting the variants to the gene and darker-filled shapes indicate more significant p values for the association. b Membrane topology plot of SLC2A9 showing variants with positive effect size (green) on urate levels and variants with negative effect size (pink). SLC2A9 (Glut9) reabsorbs urate in the proximal tubules of the kidneys. Variants that disrupt the transmembrane regions or lower gene expression are known to be associated with hypouricemia[29]. Here, 88% of the variants with negative betas, associated with lowered urate levels, are in or directly adjacent to a predicted transmembrane region, as opposed to only 55% of the variants with positive effect size. c GFI1B protein and mean platelet volume. Consistent with the literature, variants in the zinc finger domains are associated with increased platelet volumes, but we make the observation that some variants in between zinc fingers 3 and 4 may be having an effect in the opposite direction[30,31]. d ASGR1 protein and alkaline phosphatase levels. In addition to the known effects of LoF variants, we show that missense variants are also playing a role[32]. Plots of the other significantly associated genes are included in Supplementary Fig. 3.