Literature DB >> 30774981

A phenome-wide association study to discover pleiotropic effects of PCSK9, APOB, and LDLR.

Maya S Safarova1, Benjamin A Satterfield1, Xiao Fan1, Erin E Austin1, Zhan Ye2, Lisa Bastarache3, Neil Zheng3, Marylyn D Ritchie4, Kenneth M Borthwick5, Marc S Williams6, Eric B Larson7, Aaron Scrol7, Gail P Jarvik8, David R Crosslin8,9, Kathleen Leppig10, Laura J Rasmussen-Torvik11, Sarah A Pendergrass5, Amy C Sturm6, Bahram Namjou12, Amy Sanghavi Shah13, Robert J Carroll3, Wendy K Chung14,15, Wei-Qi Wei3, QiPing Feng16, C Michael Stein16, Dan M Roden17, Teri A Manolio18, Daniel J Schaid19, Joshua C Denny3, Scott J Hebbring20, Mariza de Andrade19, Iftikhar J Kullo1.   

Abstract

We conducted an electronic health record (EHR)-based phenome-wide association study (PheWAS) to discover pleiotropic effects of variants in three lipoprotein metabolism genes PCSK9, APOB, and LDLR. Using high-density genotype data, we tested the associations of variants in the three genes with 1232 EHR-derived binary phecodes in 51,700 European-ancestry (EA) individuals and 585 phecodes in 10,276 African-ancestry (AA) individuals; 457 PCSK9, 730 APOB, and 720 LDLR variants were filtered by imputation quality (r 2 > 0.4), minor allele frequency (>1%), linkage disequilibrium (r 2 < 0.3), and association with LDL-C levels, yielding a set of two PCSK9, three APOB, and five LDLR variants in EA but no variants in AA. Cases and controls were defined for each phecode using the PheWAS package in R. Logistic regression assuming an additive genetic model was used with adjustment for age, sex, and the first two principal components. Significant associations were tested in additional cohorts from Vanderbilt University (n = 29,713), the Marshfield Clinic Personalized Medicine Research Project (n = 9562), and UK Biobank (n = 408,455). We identified one PCSK9, two APOB, and two LDLR variants significantly associated with an examined phecode. Only one of the variants was associated with a non-lipid disease phecode, ("myopia") but this association was not significant in the replication cohorts. In this large-scale PheWAS we did not find LDL-C-related variants in PCSK9, APOB, and LDLR to be associated with non-lipid-related phenotypes including diabetes, neurocognitive disorders, or cataracts.

Entities:  

Year:  2019        PMID: 30774981      PMCID: PMC6370860          DOI: 10.1038/s41525-019-0078-7

Source DB:  PubMed          Journal:  NPJ Genom Med        ISSN: 2056-7944            Impact factor:   8.617


Introduction

Genetic pleiotropy is widespread; ~5% of common variants and ~17% of genomic regions are associated with more than one phenotype.[1] Genes implicated in lipoprotein metabolism are no exception and have been reported to be associated with type 2 diabetes.[2-5] The National Human Genome Research Institute-European Bioinformatics Institute (NHGRI-EBI) Genome-wide Association Study (GWAS) catalog[4] lists additional possible associations of variants near these genes with diverse diseases including Wilms’ tumor, allergic rhinitis, and bipolar disorder among others. Drugs specifically targeting genes or gene products involved in lipoprotein metabolism may therefore have unintended effects.[6,7] Pathogenic variants in proprotein convertase subtilisin/kexin type 9 (PCSK9), apolipoprotein B (APOB), and low-density lipoprotein receptor (LDLR) can lead to familial hypercholesterolemia (FH). PCSK9 influences LDLR density on the hepatocyte surface and thereby low-density lipoprotein-cholesterol (LDL-C) levels through LDLR recycling.[8] The gene product of APOB is found on LDL particles and is the ligand for LDLR.[9] Recent reports demonstrate links between LDLR variants that lead to FH and decreased risk of diabetes.[2] Conversely, statin therapy, which increases LDLR expression, is associated with risk of developing diabetes.[10] Increased risk of diabetes was noted in carriers of the LDL-C lowering variant in LDLR, rs6511720.[11] Monoclonal antibodies targeting PCSK9, and APOB antisense inhibitors are effective in lowering LDL-C levels and appear to lower the risk of atherosclerotic cardiovascular disease (ASCVD) events.[12-14] The drugs have been approved for clinical use, however long-term safety data are lacking. In particular, several studies suggest that these drugs may increase risk of diabetes,[11,15,16] neurocognitive impairment,[17-21] and cataracts,[22] although to date such associations have not been observed in prospective randomized control trials. The current study attempted to identify pleiotropic effects of variants in PCSK9, APOB, and LDLR that influence LDL-C levels with a particular focus on associations with diabetes, neurocognitive impairment, and cataracts given the concern raised in prior reports. We conducted a comprehensive agnostic investigation of associations of PCSK9, APOB, and LDLR with non-lipid phenotypes on a phenome-wide scale to complement previous Mendelian randomization and post hoc analyses that raised concern of putative adverse associations. The phenome-wide association study (PheWAS) approach starts with genetic variants or genes of interest and then a large number of phenotypes are tested for association. Such an approach has revealed numerous previously unreported genotype–phenotype associations[23,24] and provided insights into evolutionary genetics[25] and drug repositioning.[26] We attempted to extend on prior studies by including individuals of diverse ethnic backgrounds given the known differences in lipid levels by race/ethnicity[27-30] and by the use of real-world patient electronic health record (EHR) data. We leveraged high-density genotyping data linked to EHR-derived phenotypes from the electronic MEdical Records and GEnomics (eMERGE) Network[31,32] to conduct a PheWAS to test the association of variants in PCSK9, APOB, and LDLR with non-lipid phenotypes, including diabetes, neurocognitive disorders, and cataracts. Associations were validated by conducting a cross validation in the eMERGE discovery cohort. Replication of significant PCSK9-trait, APOB-trait, and LDLR-trait associations was pursued in three independent cohorts: the Vanderbilt DNA biobank (BioVU) comprising individuals of European-ancestry (EA) and African-ancestry (AA), the Marshfield Personalized Medicine Research Project (PMRP), and the UK Biobank[33] both comprised of EA individuals.

Results

Discovery cohort study population

Clinical characteristics of study participants from the discovery and three replication cohorts are shown in Table 1. Of the 83,985 individuals from the 12 eMERGE sites (Supplementary Table 1), 51,700 EA individuals (mean age 58 ± 16 years, 54% female) and 10,276 AA individuals (mean age 51 ± 16 years, 67% female) passed our quality control filters and had high-density genotyping data with imputed PCSK9, APOB, and LDLR variants, linked to the EHR.
Table 1

Clinical characteristics of study participants

VariableDiscovery Cohort (eMERGE Network) N = 62,210Replication Cohort 1 (Marshfield PMRP) N = 9562Replication Cohort 2 (BioVU) N = 29,713Replication Cohort 3 (UK Biobank) N = 408,455
RaceEAAAEAEAAAEA
n 51,70010,276956226,5823131408,455
Mean age years585162626157
Female (%)546762585254

AA African-ancestry; BioVU Vanderbilt DNA biobank; EA European-ancestry; eMERGE electronic MEdical Records and GEnomics Network; PMRP Marshfield Clinic Personalized Medicine Research Project

Clinical characteristics of study participants AA African-ancestry; BioVU Vanderbilt DNA biobank; EA European-ancestry; eMERGE electronic MEdical Records and GEnomics Network; PMRP Marshfield Clinic Personalized Medicine Research Project

Selection of variants

Collectively, individuals in the discovery set had 457 PCSK9, 730 APOB, and 720 LDLR variants. After applying quality control filters and other selection criteria including association with LDL-C, for the primary analysis, two PCSK9, three APOB, and five LDLR variants remained for PheWAS analysis in the EA cohort, but no variants remained for PheWAS analysis for the AA cohort (Fig. 1 and Table 2). Eight of these 10 variants had been tested in the Global Lipids Genetics Consortium (http://lipidgenetics.org/) and found to be significantly associated with LDL-C (Table 2).
Fig. 1

Selection of variants in the discovery cohort for the primary analysis. Collectively, individuals in the discovery cohort contained the number of variants shown for PCSK9, APOB, and LDLR. These variants were passed through various quality control filters and other selection measures including imputation quality (r2 > 0.4), minor allele frequency (MAF) > 1%, LDL-C association at the given thresholds for EA and AA, and linkage disequilibrium (r2 < 0.3). The variants passing these filters were used in the primary analysis. The rsID for each variant is shown

Table 2

Variants that passed quality control filters in the primary analysis compared with the Global Lipids Genetics Consortium

GeneChrPositionarsIDRefAltAnnotationeMERGE cohortGLGC metabochip
MAF EA (%)Betap-value LDL-CMAF in 1kGP (%)Betabp-value
PCSK9 155505647rs11591147GTissense1.4−12.971.3 × 10−271.7−0.501.6 × 10−142
55519015rs639750TGIntron32.7−1.821.0 × 10−9
APOB 221233972rs533617TCMissense3.8−4.401.3 × 10−94.9−0.141.7 × 10−27
21263639rs531819GTIntron15.5−4.072.6 × 10−2619.1−0.121.3 × 10−57
21263900rs1367117GAMissense31.63.526.4 × 10−3271.2−0.111.4 × 10−75
LDLR 1911202306rs6511720GTRegulatory intron11.4−5.794.2 × 10−399.8−0.232.8 × 10−151
11206575rs6511721AGRetained intron48.31.735.6 × 10−1048.8−0.061.5 × 10−29
11227480rs2738447CANonsense mediated decay41.5−1.674.0 × 10−942.9−0.058.4 × 10−13
11231203rs72658867GASplice regions1.1−10.202.8 × 10−14
11243445rs5742911AG3′ UTR30.7−1.793.7 × 10−926.8−0.065.3 × 10−24

Selection criteria: Imputation quality r2 > 0.4; MAF > 1%; LCL-C association (threshold of 5.0 × 10−8); LD r2 < 0.3

GLGC Global Lipids Genetics Consortium, Chr chromosome number, Ref reference allele, Alt alternate allele, MAF minor allele frequency, LDL-C low-density lipoprotein cholesterol, 1kGP 1000 Genomes program

aPosition in human genome assembly hg19

bThe difference in Beta between eMERGE and GLGC is primarily due to differences in units of measurements. eMERGE used mg/dL while GLGC used mmol/L

Selection of variants in the discovery cohort for the primary analysis. Collectively, individuals in the discovery cohort contained the number of variants shown for PCSK9, APOB, and LDLR. These variants were passed through various quality control filters and other selection measures including imputation quality (r2 > 0.4), minor allele frequency (MAF) > 1%, LDL-C association at the given thresholds for EA and AA, and linkage disequilibrium (r2 < 0.3). The variants passing these filters were used in the primary analysis. The rsID for each variant is shown Variants that passed quality control filters in the primary analysis compared with the Global Lipids Genetics Consortium Selection criteria: Imputation quality r2 > 0.4; MAF > 1%; LCL-C association (threshold of 5.0 × 10−8); LD r2 < 0.3 GLGC Global Lipids Genetics Consortium, Chr chromosome number, Ref reference allele, Alt alternate allele, MAF minor allele frequency, LDL-C low-density lipoprotein cholesterol, 1kGP 1000 Genomes program aPosition in human genome assembly hg19 bThe difference in Beta between eMERGE and GLGC is primarily due to differences in units of measurements. eMERGE used mg/dL while GLGC used mmol/L To determine whether variants not associated with LDL-C levels in the three genes were associated with other phenotypes, a secondary analysis was performed with a similar selection process in the discovery cohort that included “missense” variants not associated with LDL-C. This yielded four PCSK9 (three in EA cohort, four in AA cohort), 15 APOB (5 in EA cohort, 12 in AA cohort), and one LDLR (one in both the EA and AA cohorts) variants suitable for PheWAS analysis (Supplementary Figure 1; Supplementary Table 2).

Selection of phecodes

Of the 1815 available phenotypes, 1232 and 585 passed quality control filters for the EA and AA cohorts, respectively (Supplementary Data 1). Phecodes representing diabetes, neurocognitive disorders, and cataracts are listed in Supplementary Tables 3–5, respectively. A summary of the selection strategy for participants, variants, and phecodes, as well as the replication analysis and five-fold cross validation is shown in Fig. 2.
Fig. 2

Study outline for primary analysis. AA African-ancestry, EA European-ancestry, EHR electronic health record, eMERGE electronic MEdical Records and GEnomics Network, LD linkage disequilibrium, PMRP Personalized Medicine Research Project, QC quality control

Study outline for primary analysis. AA African-ancestry, EA European-ancestry, EHR electronic health record, eMERGE electronic MEdical Records and GEnomics Network, LD linkage disequilibrium, PMRP Personalized Medicine Research Project, QC quality control

PheWAS results

In the discovery cohort, the PheWAS identified one PCSK9, two APOB, and two LDLR variants in the EA sample that were significantly associated (p < 5.8 × 10−5) with an examined phecode (Fig. 1 and Table 3). Only one of the variants, the LDLR variant rs6511720, was associated with a non-lipid/non-ASCVD phecode, that being “myopia.” These five variants underwent additional analyses described below. Several of the variants trended towards association with ischemic heart disease, with the strongest association seen for rs639750 in PCSK9 (p = 0.0065, OR 0.96).
Table 3

Significant associations in the discovery and replication cohorts

PhecodeDescriptionVariantMAF (%)eMERGE discovery cohorteMERGE 5-fold cross validationMarshfield replication cohortVanderbilt replication cohortUK Biobank
CasesControlsp valueaOdds ratiob95% CIp valuebCasesControlsp valueaCasesControlsp valueap valuea
Lipid-related phecode associations
272Disorders of lipid metabolismrs115911471.425,29817,2053.16×10120.640.51–0.761.2×1010829117968.2×105907614,5603.5×1032.1×1028
rs53181915.525,29817,2052.30×1090.880.84–0.921.4×108
rs136711731.525,29817,2052.55×1051.071.04–1.105.4 × 10−4931418,2196.7×1048.8×1022
rs651172011.425,29817,2052.78×10150.830.78–0.877.0×1015785217105.1×103934618,2191.3×1042.0×1048
rs651172148.325,29817,2051.73×1051.071.04–1.104.0 × 10−42.5×1016
272.1Hyperlipidemiars115911471.425,16817,2052.84×10120.640.51–0.769.6×1011766617961.8×105905014,5603.9×103
rs53181915.525,16817,2051.35×1090.880.84–0.928.4×109
rs136711731.525,16817,2051.41×1051.071.04–1.113.6 × 10−4934618,2191.0×1032.1×1093
rs651172011.425,16817,2053.27×10150.830.78–0.879.8×1015725917103.1×103931418,2199.8×1056.7×10124
rs651172148.325,16817,2051.91×1051.071.04–1.104.6 × 10−48.5×1031
272.11Hypercholesterolemiars115911471.511,75317,2053.62×10100.600.44–0.763.4×108560217962.5×105384014,5608.0×1046.3×1072
rs53181915.711,75317,2056.49×1080.870.81–0.921.2×10−7
rs651172011.511,75317,2055.33×10140.800.74–0.866.1×1013531617101.4×103395318,2192.7×103
272.13Mixed hyperlipidemiars651172012.0494217,2053.90×1060.840.76–0.917.1×10514717103.6 × 10−1457218,2198.5×104
Non-lipid-related phecode associations
367.1Myopiars651172011.4413836,2721.76×1050.850.77–0.92c8.8 × 10−4387918684.5 × 10−182327,1423.5 × 10−14.6 × 10−1

ICD-9 codes were extracted from individual EHRs and converted to phecodes using the PheWAS R package

CI confidence interval, LDL-C low-density lipoprotein cholesterol, MAF minor allele frequency

aBold values are statistically significant

bOdds ratio refers to the Alt allele

cBorderline significant, other variants in LD with this variant were significant

Significant associations in the discovery and replication cohorts ICD-9 codes were extracted from individual EHRs and converted to phecodes using the PheWAS R package CI confidence interval, LDL-C low-density lipoprotein cholesterol, MAF minor allele frequency aBold values are statistically significant bOdds ratio refers to the Alt allele cBorderline significant, other variants in LD with this variant were significant A secondary PheWAS analysis of additional missense variants not associated with LDL-C was performed. None of these variants were significantly associated with a phecode in the EA or AA cohorts; therefore, no further tests with these variants were performed. Our analyses included EA and AA individuals. However, when we included the remaining 2182 non-EA/non-AA individuals (Supplementary Table 1) with the EA group, our inferences were similar. Two low-frequency PCSK9 variants, rs67608943 and rs28362286, have been associated with lower LDL-C levels in AA individuals. As no AA variants passed our selection criteria for PheWAS analysis, we performed an additional analysis with these two variants, but did not find these variants to have any significant associations.

Myopia association

There were 16 LDLR variants in LD (r2 > 0.3) with rs6511720 that were also associated with myopia. Of these, rs2228671 had the strongest association with “myopia” but a weaker association with the lipid-related phecodes. Manhattan plots of phecode associations of the LDLR variants rs6511720 (Supplementary Figure 2a) and rs2228671 (Supplementary Figure 2b) highlight that these variants, although in LD, have varying strengths of association. Supplementary Figure 3 presents the strength of association with the phecode “hypercholesterolemia” or LDL-C levels, myopia, and myopia adjusted for the phecode “hypercholesterolemia” or LDL-C levels for the 16 variants in LD. The strength of association with myopia was attenuated but remained significant after adjustment for hypercholesterolemia or LDL-C levels. Based on LD the 16 variants associated with myopia could be placed into four groups (Supplementary Figure 3). Variants in the same group had an r2 > 0.98. The variant rs6511720 (blue), relatively distant from the remaining variants, had the strongest association with LDL-C level. rs2228671 (green) along with another nine variants in its group were most strongly associated with myopia. When eMERGE consortium site was added as a covariate in the analysis, the signal for myopia was no longer significant, suggesting that one or a few sites were driving the association.

Cross validation and replication

Using five-fold cross validation, most of the lipid-related phecode associations of the PCSK9, APOB, and LDLR variants remained significant (p < 4.1 × 10−5). The association between the LDLR variant rs6511720 and the phecode “myopia” was borderline significant (Table 3). Other variants in LD with rs6511720 also had borderline significant associations with the phecode “myopia.” When eMERGE consortium site was added as a covariate in the cross validation analysis, the signal for myopia was no longer significant, again, suggesting that one or a few sites were driving the association. All lipid-related phecode associations from the PCSK9, APOB, and LDLR variants were replicated in the Marshfield PMRP, BioVU and/or UK cohorts; however, the non-lipid association of rs6511720 with the phecode “myopia” was not confirmed in any of the replication cohorts (Table 3).

Comparison to the GWAS catalog

We examined the NHGRI-EBI GWAS catalog[4] for all reported variants within the boundaries of PCSK9, APOB, and LDLR. We found 27 variants (4 in PCSK9, 14 in APOB, and 9 in LDLR) with 86 reported associations. Six of these variants were protein-function altering, either missense or stop-gain. Two variants were not available in the eMERGE dataset; therefore, we tested the remaining 70 associations in the eMERGE dataset. From those 70, 28 had significant lipid associations and no significant pleiotropic effects (cross-phenotype associations) were present, including lack of association with “myopia.” Eight variants were not available in the UK Biobank dataset; therefore, we tested the remaining 55 associations in the UK Biobank. All of these were significant replicating previously reported associations with lipid levels, ischemic heart disease, and disorders of lipoprotein metabolism. There were no significant pleiotropic effects (including lack of association with “myopia”). A list of reported associations with the UK Biobank code descriptions and eMERGE phecode equivalent is presented in Supplementary Data 2.

Power

We calculated power using the R package “powerMediation”. For logistic regression analyses with phecode as the binary outcome and genotypes as discrete predictors, power was calculated for each pair of variant and phecode, based on sample size, allele frequency for each variant, odds ratio (OR) and type I error α = 4.1 × 10−5. We had more than 80% power to detect 30% of associations in EA individuals. However power for individual variants was low (Supplementary Figure 4); for higher frequency variants, power for the phecodes “ischemic heart disease” and “type 2 diabetes”, was 0.175 and 0.143, respectively.

Discussion

In a large PheWAS we confirmed the association of PCSK9, APOB, and LDLR with disorders of lipid metabolism (hypercholesterolemia) at the variant level. We found no evidence that variation in PCSK9, APOB, and LDLR is associated with diabetes or any non-lipid phenotypes including neurocognitive disorders or cataract. This includes the PCSK9 variant rs11591147 and the LDLR variant rs6511720 for which prior studies have reported borderline significant associations with increased risk of diabetes.[11,34] In the NHGRI-EBI GWAS catalog, no associations of PCSK9, APOB, or LDLR variants with diabetes, neurocognitive disorders, or cataract have been reported. Additionally, an examination of the UK Biobank all-by-all PheWAS browser (http://pheweb.sph.umich.edu) did not demonstrate pleiotropic effects for any tested variants in PCSK9, APOB, or LDLR. In our discovery cohort we identified an association of several variants in LDLR with “myopia”, but none of these were confirmed in the replication cohorts and only the association between some LDLR variants including rs2228671 and “myopia” was present on five-fold cross validation. We were unable to find any physiological basis in the literature for an association between lipid level or lipid genes and myopia, and given the lack of replication, this could be a false positive association. Long-term safety data on PCSK9 inhibitors are not available given the limited follow up of clinical trials that have been conducted so far.[35] In particular, there is a theoretical concern for increased risk of diabetes, neurocognitive disorders, and cataracts. The U.S. Food and Drug Administration issued a directive to monitor for adverse neurocognitive events in patients treated with PCSK9 inhibitors,[36] and ongoing pharmacovigilance programs are in place. In our analysis, we did not find a significant association between PCSK9 variation and neurocognitive disorders apart from the borderline association with “myopia”. In the NHGRI-EBI GWAS catalog[4] common variants at the PCSK9 and APOB loci were associated with non-lipid/non-ASCVD traits.[37-45] Most of these variants were intergenic and were therefore excluded from our study which only included variants within the gene borders. Three variants (rs6006893, rs219553, and rs2495478) were intronic and therefore of uncertain functional significance. The association of variant rs2495478 with Wilms’ tumor was not replicated and the other two variants were not present in the eMERGE dataset to compare. The UK Biobank PheWAS browser also did not list any of these associations reported in the GWAS catalog (Supplementary Data 2). Therefore, we did not confirm the associations reported in the NHGRI-EBI GWAS catalog for variants available in our analyses. Two recent studies reported differing results regarding the association between the LDL-C lowering variant rs11591147 and risk of diabetes.[11,34] In a Mendelian randomization study PCSK9 variants associated with low LDL-C levels (rs11583680, rs11591147, rs2479409, and rs11206510) modestly increased risk of diabetes (OR 1.29; 1.11–1.50).[15] A meta-analysis encompassing 50,775 individuals with type 2 diabetes and 270,269 control subjects revealed an OR of 1.09 for rs11591147, a cholesterol-lowering variant[11] matching an OR of 1.11 (1.04–1.19) for each 10 mg PCSK9-mediated decrease in LDL-C levels.[16] Circulating PCSK9 levels are increased in patients with diabetes and metabolic syndrome.[46] On the other hand, a recent report found no association between rs11591147 and markers of glucose homeostasis or diabetes[34] and no evidence of increased risk of new-onset diabetes was found in a pooled analysis of 10 phase III trials of PCSK9 inhibitors with a follow-up period of 6–18 months.[47] Additional studies and longer-term follow-up of PCSK9 inhibitors may be needed to confirm/refute an association with diabetes. Individuals with FH have been reported to have decreased risk of diabetes and there are also links between the use of statins and an increased risk from diabetes. However, no studies have identified an association between specific APOB or LDLR variants and diabetes. We also did not find any association with specific variants in these genes with any of the 19 phecodes associated with diabetes. Of note, a recent GWAS report described that only a very small fraction of LDL-C lowering genetic variants (only 5 out of 113 variants from 90 distinct loci) were associated with type 2 diabetes.[48] None of these were in PCSK9, APOB, or LDLR. However, a lack of pleiotropic effects in a subset of variants does not exclude the possibility of pleiotropic effects for other variants in the studied genes or in other ethnic backgrounds. We evaluated the previously reported association between lipid-lowering drugs and the risk of cataracts[17,18] but observed no significant signal for PCSK9, APOB, or LDLR and any of the six tested phecodes pertinent to cataracts. We did not find the loss-of-function rs11591147 (R46L) variant to be associated with hemorrhagic stroke, although low LDL-C levels on lipid-lowering drugs have been associated with the risk of intracerebral hemorrhage.[49] While this manuscript was being reviewed, two sets of PheWAS results were published for PCSK9 variants. In the first,[50] a gene-centric score derived from four PCKS9 variants (rs11583680, rs11591147, rs2479409, and rs11206510) that were associated with LDL-C in the Global Lipids Genetics Consortium (http://lipidgenetics.org/) was associated with myocardial infarction and type 2 diabetes. Associations for individual variants were not reported. The second of these studies[51] examined only a single PCSK9 variant, rs11591147, in 337,536 individuals of predominantly European ancestry in the UK Biobank and demonstrated it to be associated with hyperlipidemia and coronary heart disease, which is similar to our results which trended toward association with ischemic heart disease but not with type 2 diabetes. Neither of these studies found any associations for PCSK9 variants with neurocognitive disorders and cataracts, nor did these examine variants in APOB or LDLR. In summary, our primary analysis identified only one pleiotropic effect, “myopia” in the discovery cohort for LDLR, which remained borderline significant on five-fold cross validation and was not replicated in any of the three replication cohorts. A PheWAS for missense variants not associated with LDL-C also did not identify any pleiotropic effects. Lastly, we did not replicate the associations reported in the NHGRI-EBI GWAS catalog for PCSK9, APOB, and LDLR variants.

Strengths and limitations

The present study included a larger sample size of AA individuals than previous PheWAS analyses. Also, in addition to correcting for multiple testing, we evaluated significant results in a large discovery cohort, three large independent replication cohorts, and conducted five-fold cross validation. Replication of the known associations with LDL-C[52] in directions consistent with previous epidemiologic and genetic studies provided an internal validation of our PheWAS approach. Our primary analysis was restricted to only functional PCSK9, APOB, and LDLR variants but we did perform a secondary analysis including only “missense” mutations with similar results. Several limitations are worth noting. First, EHRs are a repository of longitudinal data that capture phenotypes with varying resolution, thus their use for research may be subject to misclassification; some control subjects may have limited contact with the health care system possibly leading to misclassification in those individuals. Second, although the sample size of AA individuals was larger than previous studies, it was relatively small compared to the EA cohort and may not be sensitive in detecting pleiotropic associations. Given that genetic structure varies across populations of different ancestry backgrounds, there is a need to assess phenotype–genotype associations in diverse ethnic groups, including individuals of African, Asian, and Hispanic/Latino ancestry. Third, the phecodes in UK Biobank did not correspond exactly to the phecodes in the eMERGE cohort so best approximations had to be applied. Fourth, although the associations between the LDL-C-related variants and ischemic heart disease trended towards significance, these did not reach the Bonferroni threshold, highlighting that there could be pleiotropic associations that were simply below the threshold of detection in our dataset. Fifth, general limitations of the PheWAS approach that are not specific to our study include low power to detect weaker pleiotropic effects and inability to directly address potential off-target side effects of pharmacologic manipulation of the examined genes.

Conclusion

In this large-scale PheWAS we did not find LDL-C associated or missense variants in PCSK9, APOB, and LDLR to be associated with non-lipid phenotypes; specifically no association was seen with neurocognitive disorders, diabetes, or cataracts. These data suggest a lack of major pleiotropic effects of the tested PCSK9, APOB, and LDLR variants.

Methods

Genotyping, quality control, and selection criteria

High-density genotype data were available for 83,985 participants of the eMERGE network. To unify the genotype data processed on 78 different chips from 12 contributing sites, each genotype array batch was imputed via the Michigan Imputation Server (MIS; https://imputationserver.sph.umich.edu/) and all imputed batches of data were combined into a unified dataset. The imputation was based on minimac3 algorithm[53] and the genotype reference panel was from Haplotype Reference Consortium.[54] All research activities were reviewed and approved by the Institutional Review Board (IRB) at each eMERGE site and all research subjects gave written informed consent. Medications were extracted from prescription databases and/or clinic notes for each institution. Lipid lowering medications (LLMs) included: cerivastatin, rosuvastatin, simvastatin, fluvastatin, pravastatin, lovastatin, atorvastatin, and pitavastatin. For the majority (76.3%) of participants, we used median LDL-C levels prior to the use of any LLM. For the remaining 23.7% of participants with LDL-C levels while on LLM, the median LDL-C level was divided by 0.75 to impute LDL-C levels prior to initiating LLM[55] assuming a 25% reduction in LDL-C on therapy. To assess association with LDL-C, we used an additive genetic model with age, sex, LLM status, and the first two principal components as covariants. For the primary analysis we tested variants meeting the following criteria: within the PCSK9, APOB, or LDLR gene boundary (using NCBI gene reference; PCSK9, chromosome 1: 55505149–55530526; APOB, chromosome 2: 21224301–21266945, LDLR, chromosome 19: 11200037–11244506), minor allele frequency (MAF) > 1%, high imputation quality (r2 > 0.4), associated with LDL-C level, and not in linkage disequilibrium (r2 < 0.3). For a group of variants in LD, we picked the one with strongest association with LDL-C. The standard GWAS genome-wide threshold of significance of <5.0 × 10−8 was used for both the EA and AA cohorts to determine association with LDL-C. For the secondary analysis we tested all variants meeting the following criteria: within the PCSK9, APOB, or LDLR gene boundary, MAF > 1%, missense variants that were not associated with LDL-C level, high imputation quality (r2 > 0.4), and not in linkage disequilibrium (r2 < 0.3). SeattleSeq (http://snp.gs.washington.edu/SeattleSeqAnnotation138/) was used to annotate variant function including identifying missense mutations. We randomly removed one from each related pair of participants (first degree of relatives) using identity-by-descent (IBD) measures .[56] We performed principal component analysis in the eMERGE cohort and 2504 samples from the 1000 Genomes Project phase 3[57] to infer genetic ancestry. We also stratified analyses for AA individuals and EA individuals. We restricted our analyses to adults (age > 18 years). If any participant had only one instance or encounter for any of the component ICD codes, he/she was excluded from the analysis of the corresponding phecode.

Phenotyping

We converted International Classification of Diseases, Ninth Revision (ICD-9) codes from EHRs to 1815 phecodes[58] using PheWAS package.[59] A ‘case’ for a given phecode was defined as having a minimum of two ICD-9 codes on different dates. Controls did not have any related phecodes according to the exclusion criteria embedded in the PheWAS package. To retain statistical power, we only analyzed phecodes with ≥200 cases.[60]

Statistical analysis

Associations between single variants in PCSK9, APOB, and LDLR and individual phecodes were performed in the eMERGE discovery cohort stratified by genetically inferred ancestry (AA and EA individuals) as described above. In an effort to include all participants regardless of ancestry, we performed an additional analysis where we grouped all non-AA ancestries with EA. Logistic regression assuming an additive genetic model was utilized with adjustment for median age at which ICD-9 codes were recorded, sex, and the first two principal components from our evaluation of genetic ancestry described above. A scree plot showed that the first two principal components captured 79% of the variates (Supplementary Figure 5M). A Bonferroni threshold of significance was defined as 0.05/(number of tested phecodes). PheWAS analyses were repeated with site added as a covariate. The discovery cohort contained 15 additional variants that were in LD (r2 > 0.3) with rs6511720 and tested against hypercholesterolemia code/LDL-C levels, myopia code and myopia code adjusted for hypercholesterolemia code/LDL-C levels.

Cross validation

We used cross validation in the discovery cohort dataset for associated phenotypes. This methodology simulates tests on the independent test dataset and aims to prevent over-fitting.[61] In cross validation, we partitioned at random a given dataset into five equally sized subsets/folds. Then, one of the subsets was used to detect association, and this was repeated four times so that each subset was used once to perform the test. We combined the results from the five tested folds together using Fisher’s method,[62] which corresponds to performing tests on all samples. Cross-validation analysis was repeated with site included as a covariate.

Replication

Significant variant-phecode associations were evaluated in three separate cohorts. The BioVU,[63] Marshfield Clinic Biobank,[64] and the UK Biobank[33] included 29,713, 9562, and 408,455 participants, respectively. To avoid overlap between the discovery and the replication cohorts, the BioVU and Marshfield Clinic Biobank replication cohorts only included individuals who were not eMERGE participants. All UK Biobank participants for whom PheWAS results were available were included in the number above. Replication in the available datasets was defined as p-value < 0.05/number of replicated variants.

Testing association reported in the GWAS catalog

We tested whether the previously reported associations for variants in the three lipid metabolism genes were present in the eMERGE dataset and UK Biobank. We collected all the variants within the boundaries of the three genes that were listed in the National Human Genome Research Institute- European Bioinformatics Institute (NHGRI-EBI) GWAS catalog.[4] A physician mapped the phenotypes from the GWAS catalog to the closest codes used in the PheWAS package and UK Biobank. Mapping is available in Supplementary Data 2. Unmapped phenotypes were not further analyzed. We tested the association pairs in the eMERGE dataset and extracted the statistical values from the Gene ATLAS PheWAS website from UK Biobank. We used p-value 0.05 as the threshold for replication.

Power calculation

Power for a given sample size, MAF, OR, and type I error = significance level = 0.05/# of tested phecodes (α = 4.1 × 10−5) was calculated for each variant-phecode pair.[65] We summarized the power to detect associations in the EA dataset. Additionally, we calculated the post-hoc power for the phecode “ischemic heart disease” (by grouping all ICD 9 codes 411–414), type 2 diabetes, and the 10 tested genetic variants.
  61 in total

1.  Sequence variations in PCSK9, low LDL, and protection against coronary heart disease.

Authors:  Jonathan C Cohen; Eric Boerwinkle; Thomas H Mosley; Helen H Hobbs
Journal:  N Engl J Med       Date:  2006-03-23       Impact factor: 91.245

2.  Development of a large-scale de-identified DNA biobank to enable personalized medicine.

Authors:  D M Roden; J M Pulley; M A Basford; G R Bernard; E W Clayton; J R Balser; D R Masys
Journal:  Clin Pharmacol Ther       Date:  2008-05-21       Impact factor: 6.875

3.  Genome-wide association study to identify single nucleotide polymorphisms (SNPs) associated with the development of erectile dysfunction in African-American men after radiotherapy for prostate cancer.

Authors:  Sarah L Kerns; Harry Ostrer; Richard Stock; William Li; Julian Moore; Alexander Pearlman; Christopher Campbell; Yongzhao Shao; Nelson Stone; Lynda Kusnetz; Barry S Rosenstein
Journal:  Int J Radiat Oncol Biol Phys       Date:  2010-12-01       Impact factor: 7.038

4.  Statins and risk of incident diabetes: a collaborative meta-analysis of randomised statin trials.

Authors:  Naveed Sattar; David Preiss; Heather M Murray; Paul Welsh; Brendan M Buckley; Anton J M de Craen; Sreenivasa Rao Kondapally Seshasai; John J McMurray; Dilys J Freeman; J Wouter Jukema; Peter W Macfarlane; Chris J Packard; David J Stott; Rudi G Westendorp; James Shepherd; Barry R Davis; Sara L Pressel; Roberto Marchioli; Rosa Maria Marfisi; Aldo P Maggioni; Luigi Tavazzi; Gianni Tognoni; John Kjekshus; Terje R Pedersen; Thomas J Cook; Antonio M Gotto; Michael B Clearfield; John R Downs; Haruo Nakamura; Yasuo Ohashi; Kyoichi Mizuno; Kausik K Ray; Ian Ford
Journal:  Lancet       Date:  2010-02-16       Impact factor: 79.321

5.  The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies.

Authors:  Catherine A McCarty; Rex L Chisholm; Christopher G Chute; Iftikhar J Kullo; Gail P Jarvik; Eric B Larson; Rongling Li; Daniel R Masys; Marylyn D Ritchie; Dan M Roden; Jeffery P Struewing; Wendy A Wolf
Journal:  BMC Med Genomics       Date:  2011-01-26       Impact factor: 3.063

Review 6.  Apolipoprotein B levels, APOB alleles, and risk of ischemic cardiovascular disease in the general population, a review.

Authors:  Marianne Benn
Journal:  Atherosclerosis       Date:  2009-01-15       Impact factor: 5.162

Review 7.  The heritability of human disease: estimation, uses and abuses.

Authors:  Albert Tenesa; Chris S Haley
Journal:  Nat Rev Genet       Date:  2013-02       Impact factor: 53.242

Review 8.  Abundant pleiotropy in human complex diseases and traits.

Authors:  Shanya Sivakumaran; Felix Agakov; Evropi Theodoratou; James G Prendergast; Lina Zgaga; Teri Manolio; Igor Rudan; Paul McKeigue; James F Wilson; Harry Campbell
Journal:  Am J Hum Genet       Date:  2011-11-11       Impact factor: 11.025

9.  Can genetic pleiotropy replicate common clinical constellations of cardiovascular disease and risk?

Authors:  Omri Gottesman; Esther Drill; Vaneet Lotay; Erwin Bottinger; Inga Peter
Journal:  PLoS One       Date:  2012-09-28       Impact factor: 3.240

10.  A genome-wide association study identifies susceptibility loci for Wilms tumor.

Authors:  Clare Turnbull; Elizabeth R Perdeaux; David Pernet; Arlene Naranjo; Anthony Renwick; Sheila Seal; Rosa Maria Munoz-Xicola; Sandra Hanks; Ingrid Slade; Anna Zachariou; Margaret Warren-Perry; Elise Ruark; Mary Gerrard; Juliet Hale; Martin Hewitt; Janice Kohler; Sheila Lane; Gill Levitt; Mabrook Madi; Bruce Morland; Veronica Neefjes; James Nicholson; Susan Picton; Barry Pizer; Milind Ronghe; Michael Stevens; Heidi Traunecker; Charles A Stiller; Kathy Pritchard-Jones; Jeffrey Dome; Paul Grundy; Nazneen Rahman
Journal:  Nat Genet       Date:  2012-04-29       Impact factor: 38.330

View more
  9 in total

1.  Meta-GWAS of PCSK9 levels detects two novel loci at APOB and TM6SF2.

Authors:  Janne Pott; Jesper R Gådin; Elizabeth Theusch; Marcus E Kleber; Graciela E Delgado; Holger Kirsten; Stefanie M Hauck; Ralph Burkhardt; Hubert Scharnagl; Ronald M Krauss; Markus Loeffler; Winfried März; Joachim Thiery; Angela Silveira; Ferdinand M Van't Hooft; Markus Scholz
Journal:  Hum Mol Genet       Date:  2022-03-21       Impact factor: 6.150

Review 2.  Using Phecodes for Research with the Electronic Health Record: From PheWAS to PheRS.

Authors:  Lisa Bastarache
Journal:  Annu Rev Biomed Data Sci       Date:  2021-07-20

3.  Using genetic variants to evaluate the causal effect of cholesterol lowering on head and neck cancer risk: A Mendelian randomization study.

Authors:  Mark Gormley; James Yarmolinsky; Tom Dudding; Kimberley Burrows; Richard M Martin; Steven Thomas; Jessica Tyrrell; Paul Brennan; Miranda Pring; Stefania Boccia; Andrew F Olshan; Brenda Diergaarde; Rayjean J Hung; Geoffrey Liu; Danny Legge; Eloiza H Tajara; Patricia Severino; Martin Lacko; Andrew R Ness; George Davey Smith; Emma E Vincent; Rebecca C Richmond
Journal:  PLoS Genet       Date:  2021-04-22       Impact factor: 5.917

4.  Proprotein Convertase Subtilisin Kexin Type 9 Inhibitors Reduce Platelet Activation Modulating ox-LDL Pathways.

Authors:  Vittoria Cammisotto; Francesco Baratta; Valentina Castellani; Simona Bartimoccia; Cristina Nocella; Laura D'Erasmo; Nicholas Cocomello; Cristina Barale; Roberto Scicali; Antonino Di Pino; Salvatore Piro; Maria Del Ben; Marcello Arca; Isabella Russo; Francesco Purrello; Roberto Carnevale; Francesco Violi; Daniele Pastori; Pasquale Pignatelli
Journal:  Int J Mol Sci       Date:  2021-07-03       Impact factor: 5.923

Review 5.  Using the electronic health record for genomics research.

Authors:  Maya S Safarova; Iftikhar J Kullo
Journal:  Curr Opin Lipidol       Date:  2020-04       Impact factor: 4.616

Review 6.  Proprotein Convertase Subtilisin/Kexin Type 9 (PCSK9) in the Brain and Relevance for Neuropsychiatric Disorders.

Authors:  Emma M O'Connell; Falk W Lohoff
Journal:  Front Neurosci       Date:  2020-06-12       Impact factor: 4.677

7.  Medication history-wide association studies for pharmacovigilance of pregnant patients.

Authors:  Anup P Challa; Xinnan Niu; Etoi A Garrison; Sara L Van Driest; Lisa M Bastarache; Ethan S Lippmann; Robert R Lavieri; Jeffery A Goldstein; David M Aronoff
Journal:  Commun Med (Lond)       Date:  2022-09-16

8.  Associations of Genetically Predicted Lp(a) (Lipoprotein [a]) Levels With Cardiovascular Traits in Individuals of European and African Ancestry.

Authors:  Benjamin A Satterfield; Ozan Dikilitas; Maya S Safarova; Shoa L Clarke; Catherine Tcheandjieu; Xiang Zhu; Lisa Bastarache; Eric B Larson; Anne E Justice; Ning Shang; Elisabeth A Rosenthal; Amy Sanghavi Shah; Bahram Namjou-Khales; Elaine M Urbina; Wei-Qi Wei; QiPing Feng; Gail P Jarvik; Scott J Hebbring; Mariza de Andrade; Teri A Manolio; Themistocles L Assimes; Iftikhar J Kullo
Journal:  Circ Genom Precis Med       Date:  2021-07-20

9.  A phenome-wide association study (PheWAS) in the Population Architecture using Genomics and Epidemiology (PAGE) study reveals potential pleiotropy in African Americans.

Authors:  Sarah A Pendergrass; Steven Buyske; Janina M Jeff; Alex Frase; Scott Dudek; Yuki Bradford; Jose-Luis Ambite; Christy L Avery; Petra Buzkova; Ewa Deelman; Megan D Fesinmeyer; Christopher Haiman; Gerardo Heiss; Lucia A Hindorff; Chun-Nan Hsu; Rebecca D Jackson; Yi Lin; Loic Le Marchand; Tara C Matise; Kristine R Monroe; Larry Moreland; Kari E North; Sungshim L Park; Alex Reiner; Robert Wallace; Lynne R Wilkens; Charles Kooperberg; Marylyn D Ritchie; Dana C Crawford
Journal:  PLoS One       Date:  2019-12-31       Impact factor: 3.240

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.