Literature DB >> 30864329

Detecting potential pleiotropy across cardiovascular and neurological diseases using univariate, bivariate, and multivariate methods on 43,870 individuals from the eMERGE network.

Xinyuan Zhang1, Yogasudha Veturi, Shefali Verma, William Bone, Anurag Verma, Anastasia Lucas, Scott Hebbring, Joshua C Denny, Ian B Stanaway, Gail P Jarvik, David Crosslin, Eric B Larson, Laura Rasmussen-Torvik, Sarah A Pendergrass, Jordan W Smoller, Hakon Hakonarson, Patrick Sleiman, Chunhua Weng, David Fasel, Wei-Qi Wei, Iftikhar Kullo, Daniel Schaid, Wendy K Chung, Marylyn D Ritchie.   

Abstract

The link between cardiovascular diseases and neurological disorders has been widely observed in the aging population. Disease prevention and treatment rely on understanding the potential genetic nexus of multiple diseases in these categories. In this study, we were interested in detecting pleiotropy, or the phenomenon in which a genetic variant influences more than one phenotype. Marker-phenotype association approaches can be grouped into univariate, bivariate, and multivariate categories based on the number of phenotypes considered at one time. Here we applied one statistical method per category followed by an eQTL colocalization analysis to identify potential pleiotropic variants that contribute to the link between cardiovascular and neurological diseases. We performed our analyses on ~530,000 common SNPs coupled with 65 electronic health record (EHR)-based phenotypes in 43,870 unrelated European adults from the Electronic Medical Records and Genomics (eMERGE) network. There were 31 variants identified by all three methods that showed significant associations across late onset cardiac- and neurologic- diseases. We further investigated functional implications of gene expression on the detected "lead SNPs" via colocalization analysis, providing a deeper understanding of the discovered associations. In summary, we present the framework and landscape for detecting potential pleiotropy using univariate, bivariate, multivariate, and colocalization methods. Further exploration of these potentially pleiotropic genetic variants will work toward understanding disease causing mechanisms across cardiovascular and neurological diseases and may assist in considering disease prevention as well as drug repositioning in future research.

Entities:  

Mesh:

Year:  2019        PMID: 30864329      PMCID: PMC6457436     

Source DB:  PubMed          Journal:  Pac Symp Biocomput        ISSN: 2335-6928


Introduction

Cognitive decline has been observed in nearly 42% of elderly individuals at five years after cardiac surgery[1]. Of late, there has been increasing clinical evidence suggesting a link between cardiovascular and neurological diseases. To facilitate efficient disease prevention and treatment for cardiovascular and neurological diseases, it is imperative to understand the underlying, often unexplained, disease-causing mechanisms across multiple phenotypes. Pleiotropy is a phenomenon that can explain the influence of a specific allele on two or more unrelated phenotypes. While there has been evidence of polygenic pleiotropy (where multiple variants are causally associated with multiple traits) among cardiovascular[2] and neurological diseases[3], recent work has also demonstrated a genetic basis for the link between these disease groupings. In particular, there has been evidence of genetic overlap between cardiovascular disease and (a) multiple sclerosis[4] as well as (b) schizophrenia[5]. Large-scale genomics data coupled with electronic health record (EHR) data can enhance our ability to uncover novel cross phenotype associations and potentially pleiotropic variants (cross-phenotype association could also be an artifact of linkage disequilibrium (LD) or disease co-morbidities rather than true pleiotropy)[6]. In this study, we sought to identify common genetic variants that contribute to the link between diseases of the circulatory and nervous system using 43,870 unrelated European adults and 65 disease phenotypes from the Electronic Medical Records and Genomics (eMERGE) network. Statistical approaches to detect pleiotropy across multiple phenotypes can be univariate (CPMA[7], ASSET[8], MultiMeta[9], GPA[10], MTAG[11], etc.), bivariate, and multivariate (MTMM[12], MultiPhen[13], GEMMA[14], mvLMM[15], mvBIMBAM[16], etc.) in addition to network-based approaches, among others[17]. Univariate methods (e.g. Phenome wide association studies or PheWAS) are a powerful way to characterize the effect of a genetic variant on each phenotype independently, and potential pleiotropy can be detected when the same SNP is found to be significantly associated with multiple phenotypes. This method has shown great success in identifying potential pleiotropy in several clinical genomics studies[18-23]. However, a limitation of univariate analysis is that it tests only one trait at a time, so it cannot be a formal test of pleiotropy. In contrast, bivariate analysis has been shown to have higher power over univariate analysis by analyzing pairs of phenotypes simultaneously[24]. Furthermore, because bivariate analysis can be structured to test the association of a trait with a variant, while adjusting for another trait’s association with the variant, bivariate analyses can be constructed to formally test pleiotropy, and extended to multivariate traits to perform sequential tests for pleiotropic effects[25,26]. In this study, we used a bivariate analysis approach using summary-statistics from univariate analysis to test the hypothesis of “joint association” of a SNP with a trait pair while accounting for correlation in z-scores between the trait pair[24]. The alternative hypothesis here is that at least one of the two traits is significantly associated with a SNP marker. This implementation of bivariate analysis has suggested potential pleiotropy as well as hinted at underlying disease-causing mechanisms in many recent studies[27,28]. Finally, multivariate analysis is designed to test the joint association between genotype with multiple phenotypes in a single regression model. Multivariate analysis has been shown to have increased power over univariate analysis in many scenarios, including when the genotype affects either a single phenotype or multiple correlated phenotypes[29,30]. We chose MultiPhen[13] to perform multivariate analysis because of its ability to handle binary phenotypes as well as its high power, as demonstrated via simulations[29]. In this paper, we refer to MultiPhen as multivariate analysis for the sake of convenience. Again, here the alternative hypothesis is that at least one of many traits is significantly associated with the SNP marker. Since the “true” pleiotropic associations among cardiovascular diseases and neurological disorders are largely unknown, we applied three types of widely used methods to characterize the landscape of potential pleiotropy at genome-wide level[31,32]. To improve our confidence that the list of potential pleiotropic variants obtained across all three methods reflect a single causal variant instead of coincidental overlap, we performed statistical colocalization for these signals with gene expression datasets across all 48 available tissues from the Genotype-Tissue Expression (GTEx) consortium[33]. For instance, if a SNP colocalizes with an eQTL for traits A and B, it means that the same SNP associates with both: (a) gene expression and trait A, (b) gene expression and trait B. This can help us infer that the same SNP associates with both traits A and B and is likely pleiotropic. We found that many of the potentially pleiotropic signals associated with both disease groupings (diseases of the nervous and circulatory system) colocalized with eQTLs from the GTEx consortium (especially on chromosome 22) indicating that gene expression might be influencing risk of disease at those loci. This study is one of the first large-scale natural data applications and evaluation of univariate, bivariate, multivariate and colocalization methods in one comprehensive analysis. The overall study design is shown in Figure 1.
Figure 1.

Overview of the analysis plan

Methods

eMERGE network

In this study, we used data from the Electronic Medical Records and Genomics (eMERGE) network Phase III. The eMERGE network is a National Human Genome Research Institute (NHGRI) organized consortium to explore the utility of DNA biorepositories coupled with Electronic Health Record (EHR) systems for large-scale genomic research. The eMERGE network Phase III consists of 83,717 genotyped samples across multiple platforms that are imputed to Haplotype Reference Consortium 1.1 reference in genome build 37 covering ~39 million genetic variants. There are seven eMERGE adult sites included in our study: Marshfield Clinic Research Foundation, Vanderbilt University Medical Center, Kaiser Permanente Washington/University of Washington, Mayo Clinic, Northwestern University, Geisinger, and Harvard University.

Genotypic Data and Quality Control

eMERGE Phase III imputed genotypic data were cleaned following the “best-practice” quality control (QC) pipeline designed for imputed data[34]. We included genetic variants with genotype call rate > 99% and sample call rate > 99%. We selected common variants with minor allele frequency (MAF) > 0.05. To account for sample relatedness, we dropped one of each related pair of individuals with pi_hat > 0.25 (obtained from identity-by-descent estimation using PLINK[35]). We filtered out variants that had a linkage disequilibrium r greater than 0.5 using a 100kb sliding window. We also filtered out the variants with a mean of imputation score less than or equal to 0.4. We further removed variants which have MAF difference greater than 0.1 compared to European population from 1000 Genomes Project[34]. After genotypic QC assessment and LD pruning, we had 54,942 unrelated individuals of European ancestry and 533,878 SNPs.

Phenotype Definition and Selection Criteria

Phenotype Definition

Cardiovascular and neurological phenotypes were defined using International Classification of Diseases, Ninth Revision (ICD-9) billing codes. We selected 98 ICD-9 codes from “Diseases of the circulatory systems” and “Diseases of nervous system and sense organs” as our primary phenotypes. Table 1 presents the major disease groups and corresponding ICD-9 codes. Of note, association analyses were performed using individual ICD-9 codes to define case/control status, and we used broader major disease categories for the purpose of presentation. The number of clinical visits per ICD-9 code per individual was used to define case-control status for each ICD-9 code: a case would be assigned if an individual had ≥ 3 instances; a control would be assigned if an individual had zero instances; an NA would be assigned if an individual had one or two instances[22].
Table 1.

Major group and ICD-9 category of neurological disorders and cardiovascular diseases

Major GroupICD-9Codes
CirculatorySystemChronic rheumatic heart disease393–398
Hypertensive disease401–405
Ischemic heart disease410–414
Diseases of pulmonary circulation415–417
Other forms of heart disease420–429
Cerebrovascular disease430–438
Diseases of blood vessels440–449
Other diseases of circulatory system451–459
NervousSystemInflammatory diseases of the central nervous system320–327
Hereditary and degenerative diseases of the centralnervous system330–337
Pain338
Disorders of the central nervous system340–349
Disorders of the peripheral nervous system350–359

Phenotype Selection Criteria

Our cohort comprised adults of European ancestry (age ≥ 25 years old) from eMERGE network Phase III. We only used ICD-9 codes with more than or equal to 200 cases so as to increase statistical power of association tests[36]. As a result, a total of 65 cardiovascular and neurological ICD-9 based diagnoses and 43,870 individuals were included in our final round of association analyses. Individuals who have both cardiovascular and neurological disease were counted as cases for both. The sample size distribution of the 65 phenotypes is shown in Figure 2.
Figure 2.

Sample size distribution for 65 ICD-9 disease categories

Association Methods

Univariate Analysis

We performed univariate logistic regression using 65 ICD-9 based diagnoses with 533,878 variants. We adjusted logistic regression models for sex, age, eMERGE site, and the first six principal components. We used PLINK 1.90 software[35] to perform the first round of univariate analysis because of its high computational efficiency. The logistic regression models converged for 33 out of 65 phenotypes. The major reason contributing to the non-convergence was the low sample sizes corresponding to some of the sites when we adjusted for eMERGE site (7 levels) as a categorical covariate. To address this, we used PLATO 2.1.0[37] to perform the second round of logistic regression tests on the remaining 32 phenotypes with the same set of covariates as before. Since PLATO implements an increased number of iterations compared to PLINK to find the best solution for logistic models, the software achieved convergence for all the remaining models. It should be noted that when both PLINK and PLATO converge, the results are concordant; these tools have been extensively compared previously[37].

Bivariate Analysis

Bivariate analysis involved using summary-statistics (Z scores) from univariate analyses. We modeled our bivariate analysis protocol (with modifications) on the one followed by Siewert et al[27]. We first estimated mean and covariance of the Z-scores obtained from univariate analyses for each of the 2080 pairs of phenotypes using all the available LD-pruned SNPs. This was done to ensure a null bivariate normal distribution of Z scores for each pair of phenotypes and to satisfy the “independence” assumption for hypothesis testing. Subsequently, we applied a p-value threshold of 0.005 on the univariate GWAS results and filtered out any SNPs that did not meet this threshold. We also filtered out SNPs with MAF = 0.5 to remove ambiguity pertaining to which allele was chosen as the referent allele in univariate analyses. Finally, we identified a list of common SNPs and estimated a p-value for each of 2,080 “pairs” of phenotypes using a chi-squared test with two degrees of freedom. Although we conducted a reduced number of tests, it should be noted that we corrected for multiple comparisons using the original “unfiltered” SNP set in order to control our type I error rate well.

Multivariate Analysis

We performed multivariate analysis using MultiPhen 2.0.2 R package[13]. MultiPhen analyzes multiple phenotypes jointly by testing linear combinations of phenotypes against each SNP using reverse ordinal regression. We adjusted for the same set of covariates as we did for univariate tests. By default, MultiPhen excludes individuals with at least one NA out of 65 phenotypes. Under this scenario, the power of association tests would be limited as there would only be 7,535 individuals in total with extremely low case sample size per phenotype. Since we applied the “rule of three” to define a case, any person who had one or two instances of the occurrence of an ICD-9 code was set to missing (NA). Because we did not want to drop so many individuals, we needed to fill in an alternative value for the N/A. For the purposes of multivariate analyses, these missing values were replaced by 0.5 to retain comparable sample size with univariate and bivariate analysis (sensitivity analyses on top significant SNPs yielded comparable results -- see Discussion). These individuals are likely cases since they have the ICD code in their record one or two times. A detailed evaluation of this replacement strategy will be conducted in the future to determine if a more optimal imputation strategy exists. Finally, to increase computational efficiency of MultiPhen, we parallelized the runs by splitting the genome into chunks of 10Mb each.

Statistical Correction

We implemented two Bonferroni correction calculation strategies to adjust for multiple testing when comparing the statistical performance of three types of methods. The Bonferroni threshold was calculated by dividing the level of significance by the number of tests. In the first strategy (“method-specific Bonferroni”) we calculate Bonferroni threshold separately for each method. The derived significant thresholds for univariate, bivariate, multivariate testing were 1.44×10−9 [0.05/65*533878], 4.50×10−11 [0.05/(2080*533878)], and 9.37×10−8[0.05/533878], respectively. We used an overly conservative significance threshold for bivariate analyses due to potential non-independence of tests (even after LD pruning). In the second strategy (“family-wise Bonferroni”) we calculate Bonferroni threshold based on the total number of tests across all three methods. The derived significant threshold was 4.36×10−11 [0.05/(65*533878+2080*533878+533878)], and the criteria was applied across all three methods. Again, this correction is overly conservative given the correlation across the tests and methods but offers good control of the type I error rate.

Colocalization

Finally, we performed colocalization analysis to have greater confidence in our assessment of pleiotropy. We first obtained a list of potentially pleiotropic variants that cleared the “family-wise Bonferroni” multiple comparison threshold for univariate, bivariate and multivariate methods and narrowed down this list to SNPs that were associated with at least one disease from both nervous and circulatory systems. Finally, we ensured that for any given SNP, if one of the two traits in this circulatory-nervous trait pair had a univariate p-value that did not meet the “family-wise Bonferroni” threshold, it had a univariate -log10 p-value of at least 3. We termed the final list of SNPs as our “lead” SNPs. To test if these signals were being influenced by gene expression as well as driven by the same underlying variant, we performed statistical colocalization analyses using the “coloc” R package[38] between these signals and eQTLs (across all 48 available tissues) from the GTEx consortium[33]. We first obtained a 200KB window on either side of a “lead” SNP and looked for whether the lead SNP (or one in close LD with it) was an eQTL in a given tissue. If it was not an eQTL, that lead SNP was ignored. If it was an eQTL for a given tissue, we identified the corresponding “eGene” and obtained summary statistics from GTEx for all gene-variant associations in that 200KB window (either side). Note that we only chose the eGene that had the smallest p-value for a given eQTL from GTEx. Finally, for each phenotype with which the lead SNP is significantly associated, we performed statistical colocalization between the SNP and the corresponding eQTL in that tissue. We set a coloc threshold of PP4/(PP3+PP4) > 0.8 to identify pleiotropic signals that are strongly influenced by gene expression. Here PP4 refers to the posterior probability that a single SNP associates with the phenotype as well as the gene expression whereas PP3 refers to the posterior probability of having two independent SNPs associate with either.

Results

Landscape of Univariate, Bivariate and Multivariate Associations

The landscape of univariate, bivariate, and multivariate association results is shown in Figure 3. There is an overall similar trend of association signals for univariate and bivariate analysis. We found that bivariate analysis identified more significant associations than univariate analysis when the correlation between phenotypes was low (less than 0.4). From the bottom half of Figure 3, we can see if the association signal from bivariate analyses comes from pairs of circulatory, nervous or circulatory-nervous traits. Black dots in Figure 3 represent the variants that passed “method-specific Bonferroni” significance from multivariate analysis. There are scenarios in which there is no significant association from univariate/bivariate analyses but significant results from multivariate analyses. Using “method-specific Bonferroni” threshold, univariate, bivariate, and multivariate methods detected 124, 108, and, 107 unique statistically significant SNPs, respectively; and there are 49 overlapping SNPs across three methods (data not shown). The number of variants detected at the more stringent “family-wise” threshold is given in Figure 4.
Figure 3.

Univariate, Bivariate and Multivariate Results

A position-by-position comparison of genetic associations for univariate, bivariate and multivariate methods using code modified from Hudson R package (https://github.com/anastasia-lucas/hudson). The horizontal axis represents genomic locations by chromosome and the vertical axis represents −log10(p-value). Colors represent major disease groups of circulatory and nervous systems. The top plot presents univariate results with p-value less than 0.01 in triangles and multivariate results that passed “method-specific Bonferroni” threshold in black dots. The bottom plot present bivariate analysis results in a two-colored circle, denoting the two phenotypes with which a variant is associated with. The red lines in both plots are the “family-wise Bonferroni” threshold.

Figure 4.

Venn diagram of the number of SNPs obtained at a “family-wise Bonferroni” threshold

Variants associated with cardiovascular disease and neurological disorders

Among the 31 “family-wise Bonferroni” SNPs across all three methods, we obtained 9 unique variants that are significantly associated with at least one cardiovascular disease and one neurological disorder from bivariate analysis that also “colocalized” with eQTLs across a host of tissues with a coloc PP4/(PP3+PP4) probability threshold of at least 0.8. Table 2 shows a comprehensive summary of these identified 9 variants. Our colocalization analyses revealed whether there was a shared variant underlying our potentially pleiotropic signals and whether gene expression may be influencing disease risk at these loci. For instance, the SNP at chromosome 1 and position 36822024 colocalized with eQTLs in the same 35 tissues for “Muscular dystrophies and other myopathies”, “Pain” and “Other conditions of the brain” (neurological phenotypes) as well as “Heart failure”, “Essential hypertension”, “Cardiac dysrhythmias” and “Hypotension” (cardiovascular phenotypes) (eGenes: EVA1B, TRAPPC3). This means that rs10796883 influences 4 different cardiovascular disease categories, 3 different neurological disease categories as well as gene expression for EVA1B and TRAPPC3 eGenes across 35 different tissues. Likewise, the variant on chromosome 22 position 22947156 colocalized with eQTLs in 4 tissues (Brain-cerebellum, testis, transformed fibroblasts, small intestine ileum) for 4 different neurological phenotypes as well as 9 other cardiovascular phenotypes (eGenes: IGLV3–21, GGTLC2). Please refer to Supplementary table 1 at https://ritchielab.org/files/PSB2019/Veturi/Supplementary_Data_1.txt for a complete list of tissues in which each of the lead SNPs colocalizes with eQTLs.
Table 2.

Potential pleiotropic SNPs and their associated disease groups

SNPCirculatory NeglogP(Uni-variate)Nervous NeglogP(Uni-variate)NeglogP(Bi-variate)NeglogP(Multivariate)TissuecounteGenes
1:36822024rs10796883Cardiac_dysrhythmias(11.305)Muscular dystrophies and other myopathies(4.921)13.24711.16535EVA1B, TRAPPC3
Other conditions of brain(3.451)12.03035EVA1B, TRAPPC3
Pain(4.151)12.36335EVA1B, TRAPPC3
Essential hypertension(9.125)Muscular dystrophies and other myopathies(4.921)11.32535EVA1B, TRAPPC3
Heart_failure(10.029)Muscular dystrophies and other myopathies(4.921)11.98835EVA1B, TRAPPC3
Pain(4.151)11.45235EVA1B, TRAPPC3
Hypotension(8.660)Muscular dystrophies and other myopathies(4.921)10.69935EVA1B, TRAPPC3
6:32569056rs9270779Atherosclerosis(14.165)Multiple sclerosis(6.355)18.11210.8618HLA-DRB5, HLA-DRB9
Parkinson’s disease(3.196)15.09711HLA-DRB5, HLA-DRB9
Occlusion_and_stenosis_of_precerebral_arteries(6.355)Multiple sclerosis(5.913)10.4007HLA-DRB5, HLA-DRB9
Other peripheral vascular disease(6.355)Multiple sclerosis(7.442)11.7874HLA-DRB5, HLA-DRB9
14:106995720 rs716 0440Cardiac_dysrhythmias(11.322)Muscular dystrophies and other myopathies(4.394)12.98918.2915IGHV3–53,IGHV4–39, IGHV3–49
Other conditions of brain(3.726)12.4205IGHV3–53,IGHV4–39, IGHV3–49
Pain(6.297)14.2595IGHV3–53,IGHV4–39, IGHV3–49
Essential hypertension(7.451)Pain(6.297)10.6101IGHV3–49
Heart_failure(9.038)Muscular dystrophies and other myopathies(4.394)10.7528IGHV3–53,IGHV4–39, IGHV3–49, HOMER2P1
Other conditions of brain(3.726)10.4696IGHV3–53,IGHV4–39, IGHV3–49
Pain(6.297)12.4655IGHV3–53,IGHV4–39, IGHV3–49
Hypertensive chronic kidney disease(8.116)Pain(6.297)11.6235IGHV3–53,IGHV4–39, IGHV3–49
Hypotension(10.278)Muscular dystrophies and other myopathies(4.394)11.8325IGHV3–53,IGHV4–39, IGHV3–49
Other conditions of brain(3.726)11.2525IGHV3–53,IGHV4–39, IGHV3–49
Pain(6.297)13.0045IGHV3–53,IGHV4–39, IGHV3–49
Ill-defined_descriptions_and_complications_of_ heart disease(7.610)Pain(6.297)11.2241
22:22876236rs361535Other_forms_of_chronic_ischemic_heart_disease(4.985)Inflammatory and toxic neuropathy(14.211)14.70210.4241
22:22947156rs2097594Cardiac_dysrhythmias(10.930)Inflammatory and toxic neuropathy(3.011)11.23628.0191
Muscular dystrophies and other myopathies(3.773)12.1161
Other conditions of brain(3.328)11.7381
Pain(5.622)13.3481
Cardiomyopathy(12.330)Inflammatory and toxic neuropathy(3.011)12.8182GGTLC2
Muscular dystrophies and other myopathies(3.773)13.7682IGLV3–21, GGTLC2
Other conditions of brain(3.328)13.5071GGTLC2
Pain(5.622)15.5032GGTLC2
Essential_hypertension(10.187)Muscular dystrophies and other myopathies(3.773)11.3802BCRP4
Other conditions of brain(3.328)10.968
Pain(5.622)12.386
Heart_failure(20.621)Inflammatory and toxic neuropathy(3.011)19.8072GGTLC2
Muscular dystrophies and other myopathies(3.773)20.9633IGLV3–21, GGTLC2
Other conditions of brain(3.328)21.0002GGTLC2
Pain(5.622)22.5532GGTLC2
Hypertensive_chronic_kidney_disease(9.331)Muscular dystrophies and other myopathies(3.773)10.7602GGTLC2
Pain(5.622)12.1192GGTLC2
Hypotension(9.778)Muscular dystrophies and other myopathies(3.773)10.8832GGTLC2
Other conditions of brain(3.328)10.4912GGTLC2
Pain(5.622)12.0262GGTLC2
Ill-defined_descriptions_and_complications_of_heart_disease(10.665)Inflammatory and toxic neuropathy(3.011)10.8632GGTLC2
Muscular dystrophies and other myopathies(3.773)11.7032GGTLC2
Other conditions of brain(3.328)11.4782GGTLC2
Pain(5.622)13.3852GGTLC2
Other_diseases_of_endocardium(10.340)Inflammatory and toxic neuropathy(10.340)11.032
Muscular dystrophies and other myopathies(10.340)11.844
Other conditions of brain(10.340)11.617
Pain(5.622)13.627
Other forms of chronic ischemic heart dis ease(11.873)Inflammatory and toxic neuropathy(11.873)11.335
Muscular dystrophies and other myopathies(11.873)12.690
Other conditions of brain(11.873)12.530
Pain(5.622)14.168
22:25420792rs13056641Cardiac_dysrhythmias(9.528)Inflammatory and toxic neuropathy(4.159)10.81740.50511KIAA1671, SGSM1, CRYBB2, CRYBB3, IGLL3P
Organic sleep disorders(4.166)10.6871IGLL3P
Pain(4.590)11.2476KIAA1671, IGLL3P
Essential_hypertension(12.162)Inflammatory and toxic neuropathy(4.159)12.62016KIAA1671, SGSM1, CRYBB2, CRYBB3, IGLL3P, BCRP3
Organic sleep disorders(4.166)12.5211IGLL3P
Pain(4.590)13.2847KIAA1671, IGLL3P
22:25436904rs1040421Angina pectoris(3.067)Pain(13.338)15.01558.2397KIAA1671, SGSM1, IGLL3P
Atherosclerosis(5.075)Pain(13.338)15.5808KIAA1671, SGSM1, IGLL3P
Cardiac dysrhythmias(11.931)Pain(13.338)20.8727KIAA1671, SGSM1, IGLL3P
Cardiomyopathy(4.939)Pain(13.338)15.9048KIAA1671, SGSM1, IGLL3P
Conduction disorders(5.764)Pain(13.338)16.3725KIAA1671, SGSM1, IGLL3P
Essential_hypertension(10.303)Pain(13.338)19.1758KIAA1671, SGSM1, IGLL3P
Heart failure(7.101)Pain(13.338)17.1298KIAA1671, SGSM1, IGLL3P
Hypertensive chronic kidney disease(7.426)Pain(13.338)17.4048KIAA1671, SGSM1, IGLL3P
Hypotension(6.693)Pain(13.338)16.0374KIAA1671, SGSM1, IGLL3P
Other diseases of endocardium(5.845)Pain(13.338)16.6774KIAA1671, SGSM1, IGLL3P
22:28250172rs1997739Cardiac_dysrhythmias(10.517)Pain(4.966)12.44322.06419ZNRF3, TTC28-AS1
22:33079917rs5749490Cardiac_dysrhythmias(11.280)Hereditary and idiopathic peripheral neuropathy(3.049)11.88423.6019FBXO7, SLC5A4-AS1
Inflammatory and toxic neuropathy(3.958)12.2542FBXO7, SLC5A4-AS1
Mononeuritis of lower limb and unspecified site(3.1 53)12.2422FBXO7, SLC5A4-AS1
Pain(8.424)16.0119FBXO7, SLC5A4-AS1
Hypertensive chronic kidney disease(6.449)Pain(8.424)12.0649FBXO7, SLC5A4-AS1
Hypertensive heart disease(4.191)Pain(8.424)10.59210FBXO7, SLC5A4-AS1
Hypotension(8.197)Pain(8.424)12.9593FBXO7, SLC5A4-AS1

Notes: We left as missing in the table any eGene (Ensembl gene ID from GTEx) that did not have an HGNC symbol counterpart.

Discussion

In this study, we conducted EHR-based univariate, bivariate, and multivariate analyses on 43,870 adults of European ancestry from the eMERGE network using 65 cardiovascular and neurological ICD-9 disease categories. The aim of this study was to detect pleiotropic genetic variants that influence diseases of the circulatory and nervous systems. We also evaluated the performance of three types of methods for detecting pleiotropy. We observed 79, 108, and, 58 unique variants, respectively that were detected by univariate, bivariate, and multivariate methods and 31 that overlapped among the three methods using a “family-wise Bonferroni” significance threshold. Univariate analysis suggests direct association between genetic variant and phenotype; bivariate association can offer insights into whether a variant is associated with a pair of phenotypes, whereas multivariate analysis is powerful in detecting if a variant is associated with multiple phenotypes. We took the intersection of the significant genetic variants across the three methods as our list of potential pleiotropic variants. Our colocalization analyses revealed 9 SNP variants associated with at least one disease from both, nervous and circulatory system that cleared the “family-wise Bonferroni” threshold for multivariate and bivariate analyses. Since we were looking at trait pairs here, we ensured that at least one of the two traits had a univariate p-value that cleared the “family-wise Bonferroni” threshold while the other trait had a univariate -log10 p-value of at least 3. Note that we conducted sensitivity analyses for MultiPhen on identified potentially pleiotropic variants in Table 2 when missing values were imputed with 0 and 1 (i.e. treated as controls or cases) in addition to 0.5 and observed no change in significance. To cross-check overlap between methods, we also performed multivariate analysis restricted to a pair of bivariate significant traits for the 9 potentially pleiotropic variants in Table 2 and found 100% consensus between bivariate and multivariate methods. These 9 variants showed strong evidence of colocalization with eQTLs across a host of tissue types (see Supplementary table 1) from the GTEx consortium[33], especially on chromosome 22. Our results replicated previous association signals as well as detected novel associations. SNP at chromosome 6 position 32569056 (rs9270779) has been directly implicated in autonomic nervous system and has been shown to be associated with heart rate response to exercise in females suggesting it could be pleiotropic for the two disease groupings of interest[39]. Also, the corresponding eGenes for this SNP, HLA-DRB5 and HLA-DRB9 from colocalization analysis have been previously shown to be associated with multiple sclerosis. Among the 31 total SNP hits, the one at chromosome 19 position 45416741 (rs438811) is correlated with rs445925 (r2=0.341), which has been shown to be clinically relevant to cardiovascular phenotypes[40]. This SNP is also located in the APOC1/APOE region, which has been shown to be associated with Alzheimer’s disease[41]. Among novel potential pleiotropic variants identified by all three methods and colocalization analysis, 6 out of 9 variants locate on chromosome 22, suggesting its potential crucial contribution to the link between cardiovascular and neurological diseases. In particular, the eGene FBXO7 has been associated with multiple sclerosis[42] as well as heart disease[43]. As part of future work, we will conduct pathway analyses or conditional analyses to have confidence in a singular pleiotropic association or shared biology between these disease groupings. The limitations of this study are that (1) using only ICD-9 codes instead of both ICD-9 and ICD-10 codes may have reduced the number of cases in our data; (2) the use of disease category instead of disease code as phenotype might have reduced the specificity of detected associations. We are planning to incorporate ICD-9 and ICD-10 codes to define primary phenotypes and examine disease heterogeneity in the future; (3) sample size considerations led to some diagnosis codes being left out of analyses; (4) given our very conservative multiple comparison thresholds, we have likely reported only a fraction of all potential pleiotropic signals, leading to type II errors, and (5) we were unable to investigate how many additional associated variants obtained using bivariate analyses in comparison to univariate and multivariate were “true positives”. One way to investigate this would be to test for statistical colocalization on top bivariate analyses hits[27]. However, this necessitates that summary statistics be obtained from independent datasets which was not the case with our data. Replication of these signals in independent cohorts in future can help us address this limitation. In summary, we provide a framework for future pleiotropy analyses in EHR data. Our work expands the pleiotropy detection framework from univariate methods (e.g. PheWAS) to bivariate and multivariate methods in large-scale real-world EHR data to detect a broader net of potentially pleiotropic signals across cardiovascular and neurological disorders. We also utilize colocalization analyses to enhance our understanding of the influence of gene expression on these potentially pleiotropic variants and consequently on disease risk. In future, we will also try to replicate the partially overlapping SNP signals in independent cohorts.
  41 in total

1.  PLINK: a tool set for whole-genome association and population-based linkage analyses.

Authors:  Shaun Purcell; Benjamin Neale; Kathe Todd-Brown; Lori Thomas; Manuel A R Ferreira; David Bender; Julian Maller; Pamela Sklar; Paul I W de Bakker; Mark J Daly; Pak C Sham
Journal:  Am J Hum Genet       Date:  2007-07-25       Impact factor: 11.025

2.  A subset-based approach improves power and interpretation for the combined analysis of genetic association studies of heterogeneous traits.

Authors:  Samsiddhi Bhattacharjee; Preetha Rajaraman; Kevin B Jacobs; William A Wheeler; Beatrice S Melin; Patricia Hartge; Meredith Yeager; Charles C Chung; Stephen J Chanock; Nilanjan Chatterjee
Journal:  Am J Hum Genet       Date:  2012-05-04       Impact factor: 11.025

3.  Variants near FOXE1 are associated with hypothyroidism and other thyroid conditions: using electronic medical records for genome- and phenome-wide studies.

Authors:  Joshua C Denny; Dana C Crawford; Marylyn D Ritchie; Suzette J Bielinski; Melissa A Basford; Yuki Bradford; High Seng Chai; Lisa Bastarache; Rebecca Zuvich; Peggy Peissig; David Carrell; Andrea H Ramirez; Jyotishman Pathak; Russell A Wilke; Luke Rasmussen; Xiaoming Wang; Jennifer A Pacheco; Abel N Kho; M Geoffrey Hayes; Noah Weston; Martha Matsumoto; Peter A Kopp; Katherine M Newton; Gail P Jarvik; Rongling Li; Teri A Manolio; Iftikhar J Kullo; Christopher G Chute; Rex L Chisholm; Eric B Larson; Catherine A McCarty; Daniel R Masys; Dan M Roden; Mariza de Andrade
Journal:  Am J Hum Genet       Date:  2011-10-07       Impact factor: 11.025

Review 4.  The role of the proteasome in heart disease.

Authors:  Yi-Fan Li; Xuejun Wang
Journal:  Biochim Biophys Acta       Date:  2010-09-15

5.  Pervasive sharing of genetic effects in autoimmune disease.

Authors:  Chris Cotsapas; Benjamin F Voight; Elizabeth Rossin; Kasper Lage; Benjamin M Neale; Chris Wallace; Gonçalo R Abecasis; Jeffrey C Barrett; Timothy Behrens; Judy Cho; Philip L De Jager; James T Elder; Robert R Graham; Peter Gregersen; Lars Klareskog; Katherine A Siminovitch; David A van Heel; Cisca Wijmenga; Jane Worthington; John A Todd; David A Hafler; Stephen S Rich; Mark J Daly
Journal:  PLoS Genet       Date:  2011-08-10       Impact factor: 5.917

Review 6.  Genome-wide association studies in Alzheimer's disease.

Authors:  Lars Bertram; Rudolph E Tanzi
Journal:  Hum Mol Genet       Date:  2009-10-15       Impact factor: 6.150

7.  Cognitive dysfunction after cardiac surgery: Pathophysiological mechanisms and preventive strategies.

Authors:  E F Bruggemans
Journal:  Neth Heart J       Date:  2013-02       Impact factor: 2.380

8.  Powerful bivariate genome-wide association analyses suggest the SOX6 gene influencing both obesity and osteoporosis phenotypes in males.

Authors:  Yao-Zhong Liu; Yu-Fang Pei; Jian-Feng Liu; Fang Yang; Yan Guo; Lei Zhang; Xiao-Gang Liu; Han Yan; Liang Wang; Yin-Ping Zhang; Shawn Levy; Robert R Recker; Hong-Wen Deng
Journal:  PLoS One       Date:  2009-08-28       Impact factor: 3.240

9.  A mixed-model approach for genome-wide association studies of correlated traits in structured populations.

Authors:  Arthur Korte; Bjarni J Vilhjálmsson; Vincent Segura; Alexander Platt; Quan Long; Magnus Nordborg
Journal:  Nat Genet       Date:  2012-08-19       Impact factor: 38.330

10.  MultiPhen: joint model of multiple phenotypes can increase discovery in GWAS.

Authors:  Paul F O'Reilly; Clive J Hoggart; Yotsawat Pomyen; Federico C F Calboli; Paul Elliott; Marjo-Riitta Jarvelin; Lachlan J M Coin
Journal:  PLoS One       Date:  2012-05-02       Impact factor: 3.240

View more
  7 in total

1.  Statistical Impact of Sample Size and Imbalance on Multivariate Analysis in silico and A Case Study in the UK Biobank.

Authors:  Xinyuan Zhang; Ruowang Li; Marylyn D Ritchie
Journal:  AMIA Annu Symp Proc       Date:  2021-01-25

2.  Precision Medicine: Improving health through high-resolution analysis of personal data.

Authors:  Steven E Brenner; Martha Bulyk; Dana C Crawford; Jill P Mesirov; Alexander A Morgan; Predrag Radivojac
Journal:  Pac Symp Biocomput       Date:  2019

Review 3.  Recent innovations and in-depth aspects of post-genome wide association study (Post-GWAS) to understand the genetic basis of complex phenotypes.

Authors:  Zahra Mortezaei; Mahmood Tavallaei
Journal:  Heredity (Edinb)       Date:  2021-10-23       Impact factor: 3.821

4.  Large-scale genomic analyses reveal insights into pleiotropy across circulatory system diseases and nervous system disorders.

Authors:  Xinyuan Zhang; Anastasia M Lucas; Yogasudha Veturi; Theodore G Drivas; William P Bone; Anurag Verma; Wendy K Chung; David Crosslin; Joshua C Denny; Scott Hebbring; Gail P Jarvik; Iftikhar Kullo; Eric B Larson; Laura J Rasmussen-Torvik; Daniel J Schaid; Jordan W Smoller; Ian B Stanaway; Wei-Qi Wei; Chunhua Weng; Marylyn D Ritchie
Journal:  Nat Commun       Date:  2022-06-14       Impact factor: 17.694

5.  Lossless integration of multiple electronic health records for identifying pleiotropy using summary statistics.

Authors:  Ruowang Li; Rui Duan; Xinyuan Zhang; Thomas Lumley; Sarah Pendergrass; Christopher Bauer; Hakon Hakonarson; David S Carrell; Jordan W Smoller; Wei-Qi Wei; Robert Carroll; Digna R Velez Edwards; Georgia Wiesner; Patrick Sleiman; Josh C Denny; Jonathan D Mosley; Marylyn D Ritchie; Yong Chen; Jason H Moore
Journal:  Nat Commun       Date:  2021-01-08       Impact factor: 14.919

6.  Standardized Health data and Research Exchange (SHaRE): promoting a learning health system.

Authors:  Sierra Davis; Louis Ehwerhemuepha; William Feaster; Jeffrey Hackman; Hiroki Morizono; Saravanan Kanakasabai; Abu Saleh Mohammad Mosa; Jerry Parker; Gary Iwamoto; Nisha Patel; Gary Gasparino; Natalie Kane; Mark A Hoffman
Journal:  JAMIA Open       Date:  2022-01-17

7.  A unified framework identifies new links between plasma lipids and diseases from electronic medical records across large-scale cohorts.

Authors:  Yogasudha Veturi; Anastasia Lucas; Yuki Bradford; Daniel Hui; Scott Dudek; Elizabeth Theusch; Anurag Verma; Jason E Miller; Iftikhar Kullo; Hakon Hakonarson; Patrick Sleiman; Daniel Schaid; Charles M Stein; Digna R Velez Edwards; QiPing Feng; Wei-Qi Wei; Marisa W Medina; Ronald M Krauss; Thomas J Hoffmann; Neil Risch; Benjamin F Voight; Daniel J Rader; Marylyn D Ritchie
Journal:  Nat Genet       Date:  2021-06-17       Impact factor: 38.330

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.