| Literature DB >> 35534486 |
Marcin Kierczak1, Nima Rafati2, Julia Höglund3, Hadrien Gourlé3, Valeria Lo Faro3, Daniel Schmitz3, Weronica E Ek3, Ulf Gyllensten3, Stefan Enroth3, Diana Ekman4, Björn Nystedt1, Torgny Karlsson3, Åsa Johansson5.
Abstract
Despite the success of genome-wide association studies, much of the genetic contribution to complex traits remains unexplained. Here, we analyse high coverage whole-genome sequencing data, to evaluate the contribution of rare genetic variants to 414 plasma proteins. The frequency distribution of genetic variants is skewed towards the rare spectrum, and damaging variants are more often rare. We estimate that less than 4.3% of the narrow-sense heritability is expected to be explained by rare variants in our cohort. Using a gene-based approach, we identify Cis-associations for 237 of the proteins, which is slightly more compared to a GWAS (N = 213), and we identify 34 associated loci in Trans. Several associations are driven by rare variants, which have larger effects, on average. We therefore conclude that rare variants could be of importance for precision medicine applications, but have a more limited contribution to the missing heritability of complex diseases.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35534486 PMCID: PMC9085767 DOI: 10.1038/s41467-022-30208-8
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 17.694
Fig. 1Distribution of MAF, CADD/Eigen values, and fraction of variances across MAF-bins for SNVs and indels.
In all figures, the dark grey area indicates the rare variants, as defined in our analyses (MAF < = 0.0239), and the light grey indicates the common variants (MAF > 0.0239). a The MAF distribution of the variants identified in the NSPHS. The bars represent the proportion of variants with a MAF within each frequency bin. b Distribution of per sample allele counts for different MAF-bins. The bars represent proportion of alleles per sample belonging to different MAF-bins. Averages across all samples (N = 1021 with WGS data were used to derive statistics) are shown and the error bars represent the 95% width of the distribution in the cohort. c The fraction of the SNVs and indels being rare vs. common, for different CADD and Eigen values. d The proportion of genotype variance that can be attributed to variants within the MAF-bins. Each bar represents the sum of all genotype variances for variants with the MAF-bin divided by the sum of genotype variances across all variants. e, f Proportion of additive genetic variance (narrow-sense heritability) that can be attributed to variants in different MAF-bins when allelic effect sizes are weighted by e CADD values and f Eigen values.
Fig. 2MAFs and effect sizes for the lead GWAS SNVs and indels.
a Distribution of MAFs for the lead GWAS-significant (Wald-test, p < 5.00 × 10–8) primary and conditional hits. A MAF threshold of 0.01 was used in the GWAS and, consequently, no GWAS hits had a MAF below 0.01. b Effect sizes from the GWAS, in relation to MAF for the primary and conditional GWAS hits. All effect estimates are reported as absolute values.
Overview and summary statistics for the five types of SNV-sets analysed and the number of significant loci identified with each of the seven SKAT models used.
| SNV-set | No. of SNV-setsa | No. of SNVsb | Rarec | No. significant loci for the different modelsd | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | |||||
| 405 | 61 [31–116] | 47.0% | 158 | 154g | 83 | 161 | 175 | 178 | 40 | ||
| 405 | 190 [118–307] | 48.4% | 148 | 148g | 80 | 151 | 167 | 176 | 45 | ||
| 405 | 8 [5–13] | 49.1% | 138 | 132h | 87 | 150 | 158 | 167 | 43 | ||
| 18,467 | 8 [5–14] | 54.2% | 19 | 26h | 6 | 21 | 24 | 26 | 3 | ||
| 18,467 | 229 [161–330] | 51.3% | 19 | 20g,h | 8 | 21 | 26 | 26 | 5 | ||
aFor Cis-SNV-sets, each of the 405 autosomal SNV-sets were analysed only in relation to the encoded protein, whereas in the Trans-SNV-sets, the SNV-set (one for each of the 18,467 genes across the genome) was analysed in relation to all 414 proteins. The significance threshold was 0.05/3 Cis-sets/405 proteins/seven models = 5.88 × 10−6 for Cis, and 0.05/2 Trans-sets/414 proteins/18,467 SNV-sets/seven models = 4.67 × 10−10 for Trans.
bMedian [interquartile range] of the number of SNVs in the SNV-sets.
cFraction of SNVs and indels in the SNV-sets that were considered rare (MAF < 0.0239).
dThe seven models are: model (1) Unweighted, model (2) CADD or Eigen weighted, model (3) MAF weighted—β(1, 25), model (4) MAF weighted—β(1, 5), model (5) MAF weighted—β(0.5, 0.5), model (6) CommonRare, model (7) Rare only, model. See method section for more information on the models and Supplementary Fig. 7 for information on the β-distributions.
eGene ± 100 kb-regions up/downstream of each gene, filtered by Eigen >10 when analysed in Cis.
fGene ± 100 kb-regions up/downstream of each gene, filtered by CADD or Eigen >10 when analysed in Trans.
gWeighted by Eigen values.
hWeighted by CADD values.
Fig. 3Overview of the SNV-sets, and SKAT tests performed, as well as overlap between the results.
a Overview of the SNV-sets, SKAT tests performed, and overlap between the results for the different SNV-sets. The Venn diagram shows the number of overlapping loci with any significant SKAT association between the different SNV-sets. For Cis-associations a p-value of 5.88 × 10–6 was considered as threshold for significance, while for Trans-associations a threshold p-value of 4.67 × 10–10 was adopted. b Fraction of loci identified in the different models within each SNV-sets. A total of 198, 190, 182, 33, and 27 loci were identified with the five SNV-sets, respectively (N in the legend). Each bar represents the fraction of these N loci that were significant for the different SKAT models. The seven models are: (1) Unweighted; (2) CADD or Eigen weighted; (3) MAF weighted, β(1, 25); (4) MAF weighted, β(1, 5); (5) MAF weighted, β(0.5, 0.5); (6) CommonRare; (7) Rare only.
Fig. 4Overlap between the identified loci for the different SNV-sets and the GWAS.
The number of loci is the total number of independent loci identified with each SNV-set or GWAS, and the overlap size is the number of loci that overlaps between SNV-sets and GWAS for: a Cis-associations, and b Trans-associations. For the Cis-associations, a p-value (SKAT-test) of 5.88 × 10‑6 was considered as threshold for significance for the SKAT analyses, and a p-value (Wald-test) of 3.00 × 10−8 for the GWAS. For Trans-associations, a p-value (SKAT-test) of 4.67 × 10−10 was considered as threshold for significance for the SKAT analyses, and a p-value (Wald-test) of 3.92 × 10−11 for the GWAS.
Fig. 5Regional plots for three proteins.
The grey circles are the –log10 p-values (Wald-test) from the GWAS. Horizontal lines indicate the –log10 p-values (SKAT-test) for the Trans-Flank-sets in black and Trans-CDS-sets in blue. As can be clearly seen, multiple, partly overlapping Trans-Flank-sets have been analysed. a Results for TNFRSF10C levels, b VWC2 levels, and c NEP levels.
Fig. 6Overlap between the total number of associations for the different SKAT model.
The bars to the left represent the total number of associations per model, and the bars in the top (overlap size) the number of associations that overlap between the different models. a is the small SNV-sets (CDS-sets and Reg-sets) in the NSPHS, b the larger Flank-SNV-sets, and c the CDS-sets in the UKB.