| Literature DB >> 28240269 |
Karsten Suhre1, Matthias Arnold2, Aditya Mukund Bhagwat3, Richard J Cotton3, Rudolf Engelke3, Johannes Raffler2, Hina Sarwath3, Gaurav Thareja1, Annika Wahl4,5, Robert Kirk DeLisle6, Larry Gold6, Marija Pezer7, Gordan Lauc7, Mohammed A El-Din Selim8, Dennis O Mook-Kanamori9, Eman K Al-Dous10, Yasmin A Mohamoud10, Joel Malek10, Konstantin Strauch11,12, Harald Grallert4,5,13, Annette Peters5,13,14, Gabi Kastenmüller2,13, Christian Gieger4,5,13, Johannes Graumann3.
Abstract
Genome-wide association studies (GWAS) with intermediate phenotypes, like changes in metabolite and protein levels, provide functional evidence to map disease associations and translate them into clinical applications. However, although hundreds of genetic variants have been associated with complex disorders, the underlying molecular pathways often remain elusive. Associations with intermediate traits are key in establishing functional links between GWAS-identified risk-variants and disease end points. Here we describe a GWAS using a highly multiplexed aptamer-based affinity proteomics platform. We quantify 539 associations between protein levels and gene variants (pQTLs) in a German cohort and replicate over half of them in an Arab and Asian cohort. Fifty-five of the replicated pQTLs are located in trans. Our associations overlap with 57 genetic risk loci for 42 unique disease end points. We integrate this information into a genome-proteome network and provide an interactive web-tool for interrogations. Our results provide a basis for novel approaches to pharmaceutical and diagnostic applications.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28240269 PMCID: PMC5333359 DOI: 10.1038/ncomms14357
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Figure 1The genome-proteome-disease network.
(a) Data sources integrated into the network, indicating the number and type of the overlapping associations, from the SNP to the disease end point; all associations are freely accessible at http://proteomics.gwas.eu. (b) Circular plot of all cis- and trans-associations, cis-pQTLs are indicated by triangles, trans-pQTLs connect associated variant locations and trans-encoded protein locations. An interactive version of this circular plot constitutes an entry point to query the integrated web-server. (c) Example of a genome-proteome-disease sub-network obtained from the server for a query using the search word ‘Crohn's Disease'. Network elements are disease traits (pink hexagons), pQTL loci (green diamonds), protein levels (blue ovals); nodes are connected by genetic associations, partial correlations and disease GWAS associations. This example (edited here for clarity) revealed four risk loci that associated with plasma levels of C7, MST, IL23R and IL18R, respectively. These four proteins all have a major role in auto-immune disorders. Partial correlations between neighbouring proteins reveal pathways that may be involved in the aetiology of Crohn's disease. Similar networks can be retrieved starting with a query using any of the 539 pQTLs, 1,124 proteins and 42 unique co-associated disease end points. In the integrated web-server, all items are interactively linked to association data from the discovery and the replication study, regional association plots based on imputed variants, locus annotations including co-associated eQTL-, meQTL-, mQTL-, regulatory-, coding- and disease risk-variants, and link-outs to relevant protein databases, original data sources and primary publications. The links in this network reflect the outcome of many natural experiments, represented by genetic variations observed in the genomes of hundreds of individuals from the general population and probed by deep proteomics phenotyping using over 1,100 different aptamers.
Figure 2Examples of protein levels that are determined by multiple independent genetic variants.
Box-whisker plots of protein levels of (a) Haemopexin HPX and (b) SLAMF7 as a function of genotype. Data presented from the KORA study (N=997). Whiskers extend to the most extreme data point that still falls within the 1.5 inter-quartile range. The number of minor alleles of the respective genetic variant is given; for instance, in a, ‘002' refers to individuals that are homozygous for the major alleles of rs61818956 and rs4915318 and for the minor allele of rs10494745, and in b, ‘0.2' refers to homozygotes of the major allele of rs11581248 and the minor allele of rs489286. Only variant combinations that were observed in the study population are shown in the case of HPX. SNPs rs61818956, rs4915318 and rs10494745 are located in trans in the CFHR2/CFHR4 gene locus. Further examples are shown in Supplementary Fig. 2.
List of replicated trans-pQTLs.
| 192 | GJA9ace, RHBDL2ace, RP5-864K19.6ace, RRAGCab, MYCBPac, … | rs4494114 | 1 | 39,339,682 | Trans | 1.9 × 10−20 | 3.0 × 10−8 | Kunitz-type protease inhibitor 1 (SPINT1) |
| 210 | C1orf168bc, C8Aa, C8Ba, DAB1b | rs626457 | 1 | 57,407,484 | Trans | 3.3 × 10−19 | 1.7 × 10−8 | Neurexophilin-1 (NXPH1) |
| 71 | PSRC1ac, CELSR2ac, AMPD2b, SORT1c | rs646776 | 1 | 109,818,530 | Trans | 1.0 × 10−52 | 1.7 × 10−18 | Granulins (GRN) |
| 413 | F5ae | rs9332653 | 1 | 169,490,772 | Cis | 1.6 × 10−11 | 7.5 × 10−5 | Coagulation Factor V (F5) |
| Trans | 1.9 × 10−11 | 1.5 × 10−5 | Calcium/calmodulin-dependent protein kinase type 1 (CAMK1) | |||||
| 17 | F5ae, SELPb, SELLb | rs4525 | 1 | 169,511,734 | Trans | 2.0 × 10−110 | 1.3 × 10−32 | Calcium/calmodulin-dependent protein kinase type 1 (CAMK1) |
| 98 | CFHa, CFHR3c | rs6695321 | 1 | 196,675,861 | Trans | 9.4 × 10-40 | 5.1 × 10−5 | Complement C1s subcomponent (C1S) |
| 72 | CFHR4ae, CFHR2a, ASPMa, ZBTB41a | rs10494745 | 1 | 196,887,457 | Trans | 1.8 × 10−52 | 2.4 × 10−11 | Haemopexin (HPX) |
| 76 | CFHR4ac, CFHR2/5a,CFHR1/3c, CFHc | rs10801582 | 1 | 196,944,357 | Trans | 1.1 × 10−49 | 2.0 × 10−11 | Haemopexin (HPX) |
| 122 | TRIM58ae | rs1339847 | 1 | 248,039,294 | Trans | 9.2 × 10−33 | 5.4 × 10−6 | Dynein light chain roadblock-type 1 (DYNLRB1) |
| 93 | COLEC11ac, ALLCc | rs7588285 | 2 | 3,648,186 | Cis | 1.1 × 10−40 | 1.4 × 10−16 | Collectin-11 (COLEC11) |
| Trans | 2.7 × 10−37 | 8.7 × 10−20 | Interleukin-19 (IL19) | |||||
| 157 | LTFabe, CCR5b, CCR3c | rs1126478 | 3 | 46,501,213 | Trans | 5.1 × 10−26 | 4.1 × 10−13 | Alkaline phosphatase, tissue-nonspecific isozyme (ALPL) |
| 95 | IP6K2abce, CELSR3abc, NCKIPSDabc, ARIH2abc, USP19abe,... | rs11715835 | 3 | 48,770,732 | Trans | 2.2 × 10−40 | 4.2 × 10−8 | Thioredoxin domain-containing protein 12 (TXNDC12) |
| 91 | RBM6ac, RNF123ae, BSNa, AMIGO3a, GMPPBa,... | rs4688759 | 3 | 50,008,118 | Trans | 1.8 × 10−42 | 4.2 × 10−8 | Thioredoxin domain-containing protein 12 (TXNDC12) |
| 52 | DCBLD2c, CPOXc, ST3GAL6c | rs10935480 | 3 | 98,431,986 | Trans | 9.9 × 10−70 | 1.2 × 10−17 | Vascular endothelial growth factor receptor 3 (FLT4) |
| 352 | DNAJC13abc, ACAD11abe, NPHP3ac, ACKR4a, UBA5a,... | rs17412738 | 3 | 132,257,419 | Trans | 5.3 × 10−13 | 3.8 × 10−5 | C-C motif chemokine 21 (CCL21) |
| 203 | PCOLCE2ac, U2SURPb, ATRb, PLS1b | rs4683702 | 3 | 142,617,138 | Trans | 2.0 × 10−19 | 8.3 × 10−5 | Endothelin-converting enzyme 1 (ECE1) |
| 31 | HRGae | rs2228243 | 3 | 186,395,113 | Cis | 4.7 × 10−94 | 2.7 × 10−25 | Histidine-rich glycoprotein (HRG) |
| Trans | 4.2 × 10−82 | 6.0 × 10−18 | Dual specificity mitogen-activated protein kinase kinase 4 (MAP2K4) | |||||
| 43 | HRGae | rs1042445 | 3 | 186,395,436 | Trans | 8.4 × 10−78 | 1.2 × 10−22 | Dual specificity mitogen-activated protein kinase kinase 4 (MAP2K4) |
| 26 | KNG1ae | rs2304456 | 3 | 186,445,052 | Cis | 2.9 × 10−97 | 2.8 × 10−40 | Kininogen-1 (KNG1) |
| Trans | 1.4 × 10−10 | 1.5 × 10−6 | Leucine carboxyl methyltransferase 1 (LCMT1) | |||||
| 96 | KNG1ae | rs5030062 | 3 | 186,454,180 | Trans | 3.0 × 10−40 | 1.7 × 10−13 | Coagulation Factor XI (F11) |
| Trans | 1.8 × 10−35 | 8.0 × 10−10 | Plasma kallikrein (KLKB1) | |||||
| 123 | SKIV2Lac, C2a, NELFEa, DXOa, STK19a,... | rs9283893 | 6 | 31,897,219 | Trans | 1.2 × 10−32 | 1.6 × 10−28 | Neutrophil collagenase (MMP8) |
| 159 | SKIV2Lace, C4Bac, TNXBab, DXOa, STK19a,... | rs387608 | 6 | 31,941,557 | Trans | 2.5 × 10−25 | 2.4 × 10−8 | gp41 C34 peptide, HIV (Human-virus) |
| 182 | TAP2a, PSMB9a, TAP1a, PSMB8a, COL11A2b,... | rs17220241 | 6 | 32,822,244 | Trans | 9.6 × 10−22 | 1.3 × 10−5 | Alpha-2-macroglobulin receptor-associated protein (LRPAP1) |
| 417 | ZFPM2a, CXCL5d | rs16873418 | 8 | 106,592,145 | Trans | 1.9 × 10−11 | 1.5 × 10−5 | Tumour necrosis factor receptor superfamily member EDAR (EDAR) |
| 442 | ABOac, SURF1c, SLC2A6c, GBGT1c | rs7857390 | 9 | 136,128,546 | Trans | 5.8 × 10−11 | 1.0 × 10−6 | Tyrosine-protein kinase receptor Tie-1, soluble (TIE1) |
| 69 | ABOabce, OBP2Bbc, DBHb, SURF1/2b, ADAMTSL2b,... | rs8176749 | 9 | 136,131,188 | Trans | 6.1 × 10−53 | 2.7 × 10−42 | Cadherin-5 (CDH5) |
| Trans | 1.7 × 10−51 | 3.5 × 10−27 | Tyrosine-protein kinase receptor Tie-1, soluble (TIE1) | |||||
| Trans | 1.1 × 10−35 | 2.0 × 10−10 | Angiopoietin-1 receptor, soluble (TEK) | |||||
| Trans | 1.5 × 10−11 | 5.8 × 10−5 | Basal cell adhesion molecule (BCAM) | |||||
| Cis | 6.0 × 10−10 | 5.3 × 10−6 | Neurogenic locus notch homolog protein 1 (NOTCH1) | |||||
| 354 | ABOac, SURF6c | rs8176720 | 9 | 136,132,873 | Trans | 7.4 × 10−10 | 6.8 × 10−8 | Insulin receptor (INSR) |
Please see the legend of Table 2 for a description of the replicated trans-pQTL loci.
*NOTCH1 is encoded in cis on chromosome 9, but distant from ABO.
List of replicated trans-pQTLs
| 36 | ABOabc, TSC1b, AK8b, SARDHb, GBGT1c | rs505922 | 9 | 136,149,229 | Trans | 1.2 × 10−20 | 1.0 × 10−8 | von Willebrand factor (VWF) |
| Trans | 7.6 × 10−86 | 4.1 × 10−28 | CD209 antigen (CD209) | |||||
| 269 | ABOac, DBHb, ADAMTSL2b, SARDHb, RALGDSb,... | rs630510 | 9 | 136,149,581 | Trans | 1.6 × 10−15 | 2.4 × 10−10 | Tyrosine-protein kinase receptor Tie-1, soluble (TIE1) |
| 28 | ABOac, DBHb, ADAMTSL2b, SARDHb, RALGDSb,... | rs651007 | 9 | 136,153,875 | Trans | 1.2 × 10−96 | 8.2 × 10−23 | E-Selectin (SELE) |
| Trans | 3.9 × 10−44 | 5.2 × 10−15 | Insulin receptor (INSR) | |||||
| Trans | 1.1 × 10−31 | 1.4 × 10−8 | Vascular endothelial growth factor receptor 3 (FLT4) | |||||
| Trans | 3.3 × 10−19 | 1.2 × 10−9 | Hepatocyte growth factor receptor (MET) | |||||
| Trans | 3.1 × 10−13 | 1.9 × 10−5 | Vascular endothelial growth factor receptor 2 (KDR) | |||||
| Trans | 7.4 × 10−13 | 3.4 × 10−6 | P-Selectin (SELP) | |||||
| Trans | 8.0 × 10−11 | 8.2 × 10−6 | OX-2 membrane glycoprotein (CD200) | |||||
| 79 | CPN1a, HIF1ANc | rs7091871 | 10 | 101,810,304 | Trans | 1.0 × 10−48 | 3.5 × 10−16 | Calcium/calmodulin-dependent protein kinase type 1 (CAMK1) |
| Trans | 1.4 × 10−12 | 1.7 × 10−7 | Calpastatin (CAST) | |||||
| 150 | SIK3ab, SIDT2bc, PCSK7b, BUD13b, RNF214b,... | rs12099358 | 11 | 116,726,048 | Trans | 1.6 × 10−26 | 1.9 × 10−8 | Beta-endorphin (POMC) |
| 65 | OAFabe, POU2F3bc, ARHGEF12b, TMEM136b, TRIM29b,... | rs692804 | 11 | 120,099,368 | Trans | 1.1 × 10−56 | 5.3 × 10−24 | Interleukin-25 (IL25) |
| 115 | C1Sabce, C1RLc | rs12146727 | 12 | 7,170,336 | Cis | 3.4 × 10−35 | 3.6 × 10−7 | Complement C1r subcomponent (C1R) |
| Trans | 2.0 × 10−15 | 8.8 × 10−6 | Complement C1q subcomponent (C1QA C1QB C1QC) | |||||
| 304 | POC1B-GALNT4ace, GALNT4ace, POC1Bac, ATP2B1c | rs722414 | 12 | 89,937,437 | Trans | 2.3 × 10−14 | 3.1 × 10−6 | CMRF35-like molecule 6 (CD300C) |
| 8 | ZC3H13ae, CPB2ae, LCP1c | rs1926447 | 13 | 46,629,944 | Trans | 2.2 × 10−145 | 4.3 × 10−5 | MAP kinase-activated protein kinase 3 (MAPKAPK3) |
| 155 | PROZac, PCID2a, CUL4Aa | rs515863 | 13 | 113,839,747 | Trans | 2.5 × 10−26 | 6.2 × 10−9 | Dual specificity mitogen-activated protein kinase kinase 2 (MAP2K2) |
| 67 | DHX38ac, TXNL4Ba, PMFBP1a, HPRc, DHODHc, HPc | rs9302635 | 16 | 72,144,174 | Cis | 6.3 × 10−54 | 6.2 × 10−19 | Haptoglobin (HP) |
| Trans | 2.8 × 10−10 | 1.4 × 10−8 | Ferritin (FTH1 FTL) | |||||
| 82 | SARM1ac, VTNa, SLC46A1a, TMEM199c, POLDIP2c, TMEM97c | rs2239908 | 17 | 26,725,265 | Trans | 9.5 × 10−48 | 3.1 × 10−10 | Semaphorin-3 A (SEMA3A) |
| Trans | 3.0 × 10−24 | 3.9 × 10−8 | Calcium/calmodulin-dependent protein kinase type 1D (CAMK1D) | |||||
| Trans | 1.1 × 10−10 | 7.7 × 10−5 | WNT1-inducible-signalling pathway protein 1 (WISP1) | |||||
| 54 | APOC4ae, APOC4-APOC2ae, APOC2a, APOEc, APOC1c,... | rs5167 | 19 | 45,448,465 | Trans | 2.1 × 10−69 | 9.5 × 10−6 | Granulocyte colony-stimulating factor (CSF3) |
| 106 | TRPC4APabc, EDEM2abc, PROCRace, GSSab, MYH7Bab,... | rs867186 | 20 | 33,764,554 | Trans | 5.5 × 10−38 | 9.0 × 10−24 | Vitamin K-dependent protein C (PROC) |
Loci that comprise at least one replicated trans-association. Loci are referenced in this study by numbers ranging from 1 to 451 (strongest to weakest) and sorted here by chromosome position. P values are for the association with inverse-normal scaled protein levels from linear regression with genotype; see Supplementary Data 1 for full data of all 539 SNP-protein associations at 451 loci, including statistics for association with alternatively raw- and log-normal-scaled protein levels and estimated replication power. Candidate genes for the protein associations were annotated by considering the following criteria: a variant in linkage disequilibrium with the sentinel SNP (r2>0.8) is located in the gene transcript (superscript a), a variant hits a regulatory element of that gene (superscript b), a variant is a cis-eQTL (superscript c), a variant is a trans-eQTL (superscript d), a variant is protein changing (superscript e). The list of candidate genes in this table is limited to the five most plausible candidate genes for each locus. The full list is available online and in Supplementary Data 1. Every trans-pQTL implies the existence of a functional and causal link between a cis-encoded candidate gene and the trans-encoded target protein(s).
Figure 3Genotype-dependent co-associations of the plasma proteome and the plasma N-glycome.
Bee swarm plots of total plasma N-glycans GP19 and GP33 (% of total N-glycan content) as a function of rs3760775 and rs8283 genotype, respectively, see inset for glycan structure, Blue squares: N-acetylglucosamine, green circles: mannose, yellow circles: galactose, purple diamonds: N-acetylneuraminic acid; Scatter plots of total plasma N-glycans GP19 and GP33 as a function of Complement factor 4 (C4) and Galactoside 3(4)-L-fucosyltransferase (FUT3) genotype (raw data), respectively (b,d), black: major allele homozygotes, red: heterozygotes, green: minor allele homozygotes. Large circles indicate means by genotype. P values are for the association of glycans with genotype (a,c) and of glycans with protein levels (b,d). P values are uncorrected from linear regression. The major allele variant of SNP rs3760775 was reported to be associated with the cancer antigen 19-9 and that of SNP rs8283 with increased risk of rheumatoid arthritis.
Figure 4Protein and mRNA expression levels of endoplasmic reticulum aminopeptidase 1 (ERAP1) as a function of two ankylosing spondylitis (AS)-risk alleles.
Box-whisker plots of (a) ERAP1 blood circulating protein levels and (b) ERAP1 mRNA expression levels observed in lymphoblastoid cells as a function of the sum of AS-risk alleles (minor allele of rs26496, r2=0.46 with rs30187; major allele of rs17482078, r2=0.96 with rs10050860); the number of individuals per genotype combination is in parentheses; whiskers extend to the most extreme data point that still falls within the 1.5 inter-quartile range.