| Literature DB >> 33303764 |
Arthur Gilly1,2, Young-Chan Park2,3, Grace Png1,2, Andrei Barysenka1, Iris Fischer1, Thea Bjørnland2,4, Lorraine Southam1,2,5, Daniel Suveges2,6, Sonja Neumeyer1, N William Rayner1,2,7,8, Emmanouil Tsafantakis9, Maria Karaleftheri10, George Dedoussis11, Eleftheria Zeggini12,13,14.
Abstract
The human proteome is a crucial intermediate between complex diseases and their genetic and environmental components, and an important source of drug development targets and biomarkers. Here, we comprehensively assess the genetic architecture of 257 circulating protein biomarkers of cardiometabolic relevance through high-depth (22.5×) whole-genome sequencing (WGS) in 1328 individuals. We discover 131 independent sequence variant associations (P < 7.45 × 10-11) across the allele frequency spectrum, all of which replicate in an independent cohort (n = 1605, 18.4x WGS). We identify for the first time replicating evidence for rare-variant cis-acting protein quantitative trait loci for five genes, involving both coding and noncoding variation. We construct and validate polygenic scores that explain up to 45% of protein level variation. We find causal links between protein levels and disease risk, identifying high-value biomarkers and drug development targets.Entities:
Year: 2020 PMID: 33303764 PMCID: PMC7729872 DOI: 10.1038/s41467-020-20079-2
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1Genome-wide association signals across all tested proteins.
For clarity, variants with P > 1 × 10−5 are not represented in the figure. Variants with P < 7.45 × 10−11 are plotted in green. Source data are provided as a Source Data File (score test, one-sided).
Fig. 2Characteristics of independently contributing pQTL variants.
The innermost circle represents replication status: dark grey for variants that replicate, medium grey for variants that do not replicate and light grey for variants for which no proxy was found in the Pomak dataset.
Fig. 3Rare variant pQTLs.
Rare variant burden signals detected in this study -the most significant burden per gene is displayed. Circles denote the sequence variants identified in the region. Genes are denoted in gray below the regional association plots; bars represent exons across all transcripts. Horizontal red lines indicate the −log10 of the burden signal p-value, with size and colour of circles proportional to the weighting scheme used (CADD for ACP6, GRN and DPP7, or Eigen for PON3 and IL1RL1. For CTSO, where only severe variants are considered, all variants have weights equal to 1. Grey circles denote variants not included in the burden. Details on variants included in each burden are given in Supplementary Data 14.
Fig. 4Significant causal protein-disease associations identified through two-sample Mendelian randomisation.
Protein (exposure) names are indicated on the left, diseases (outcomes) on the right. Identical disease names for a given protein indicate a MR signal replicating across multiple studies of the same disease; further details and causal associations with quantitative traits are displayed in Supplementary Data 4. RA: rheumatoid arthritis, IBD: inflammatory bowel disease, CD: Crohn’s disease, CHD: Coronary heart disease, CAD: Coronary artery disease, UC: ulcerative colitis, DKD: diabetic kidney disease, T2D: type 2 diabetes, T1D: type 1 diabetes, CKD: chronic kidney disease, MS: multiple sclerosis. Error bars denote standard errors.