Literature DB >> 30696806

Using phenome-wide association to investigate the function of a schizophrenia risk locus at SLC39A8.

Thomas H McCoy1, Amelia M Pellegrini1, Roy H Perlis2.   

Abstract

While nearly all common genomic variants associated with schizophrenia have no known function, one corresponds to a missense variant associated with change in efficiency of a metal ion transporter, ZIP8, coded by SLC39A8. This variant has been linked to a range of phenotypes and is believed to be under recent selection pressure, but its impact on health is poorly understood. We sought to understand phenotypic implications of this variant in a large genomic biobank using an unbiased phenome-wide approach. Specifically, we generated 50 topics based on diagnostic codes using latent Dirichlet allocation, and examined them for association with the risk variant. Then, any significant topics were further characterized by examining association with individual diagnostic codes contributing to the topic. Among 50 topics, 1 was associated at an experiment-wide significance threshold (beta = 0.003, uncorrected p = 0.00049), comprising predominantly brain-related codes, including intracranial hemorrhage, cerebrovascular disease, and delirium/dementia. These results suggest that a functional variant previously associated with schizophrenia risk also increases liability to cerebrovascular disease. They further illustrate the utility of a topic-based approach to phenome-wide association.

Entities:  

Mesh:

Substances:

Year:  2019        PMID: 30696806      PMCID: PMC6351652          DOI: 10.1038/s41398-019-0386-9

Source DB:  PubMed          Journal:  Transl Psychiatry        ISSN: 2158-3188            Impact factor:   6.222


Introduction

Despite the remarkable success of genome-wide association studies (GWAS) in medicine, a central challenge remains extrapolating from common-variant associations to actionable disease biology[1]. In particular, the polygenicity of most common disorders, and the lack of functional single-nucleotide polymorphisms (SNPs), renders follow-up of GWAS challenging even in cellular or animal models. As a complement to more traditional efforts at functional genomics using model systems, phenome-wide association, or PheWAS, seeks to understand the implications of a risk variant by characterizing associated phenotypes in vivo[2]. However, such studies carry substantial risk of type 1 error because they typically examine 1000 or more phenotypes. Moreover, many biobanks and registries rely on individual billing or claims codes for which reliability varies substantially[3-6]. To address both of these limitations, we have previously demonstrated that the use of probabilistic topic models, an approach drawn from natural language processing that draws on groups of related diagnostic codes rather than individual codes, provides interpretable dimensionality reduction as well as making efficient use of sparse count data (i.e., the fact that most individuals will not have any given diagnosis)[7,8]. Here we apply this method to examine phenotypic implications of a recently identified common variant associated with schizophrenia risk[9,10]. Notably, among all of the first 108 loci associated with schizophrenia, only this variant is a nonsynonymous coding SNP, which is functional (i.e., the risk allele is associated with decreased metal ion transport) and common in Northern European populations[11]. However, little is known about its physiologic role, particularly in the context of brain function. Therefore, to better understand this schizophrenia risk locus, we conducted a topic-based PheWAS in a large genomic biobank linked to electronic health records (EHRs) of multiple academic medical centers.

Materials and methods

Cohort derivation, genotyping, and quality control

The cohort was derived from the first four genotyping waves of the Partners HealthCare Biobank Initiative[12], which spans N = 20,084 inpatients and outpatients across two large academic medical centers and affiliates (n = 4927, 5353, 4784, and 5020). Participants provided written informed consent for EHRs to be analyzed in protocols approved by the Partners HealthCare Institutional Review Board, along with a blood sample for DNA extraction. After extraction of DNA from buffy coat, samples were genotyped via one of the Illumina Multi-Ethnic Genotyping Arrays, which include content from phase 3 of the 1000 Genomes Project. For details of genotyping, see our prior publication[7]. To address potential batch effects across the four genotyping waves, we cleaned, imputed, and analyzed each one separately. In each wave, participants were included with genotyping call rates >99%, and no related individuals based on identity by descent (defined by pi-hat > 0.25)[13]. From these individuals, genotyped SNPs were retained if call rate was at least 95% and Hardy-Weinberg equilibrium p value was >1 × 10–6. Genotypes were imputed using the Michigan Imputation Server implementing Minimac3[14-16] with all population subsets from 1000G Phase 3 v5 as reference panel; haplotypes were phased using SHAPEIT[17]. The SNP of interest here is imputed but with a high degree of confidence (rsq/info = 0.911; avg call = 0.99). Minor allele frequency is 0.079, consistent with other reports in European cohorts[18].

Ancestry

To address risk for stratification artifact, each genotyping wave was examined via principal components analysis of linkage-disequilibrium-pruned genotyped SNPs as a measure of population substructure, using the PLINK 1.9 implementation of EIGENSTRAT. HapMap samples of Northern European ancestry were used to confirm location of this population group[19-21], yielding n = 3593, 3327, 3552, and 3105 participants from genotyping waves 1–4, respectively.

Topic identification

As in our prior work, we identified topics based on the ninth revision of the International Statistical Classification of Diseases (ICD-9) diagnosis codes extracted from each individual’s EHR data, further grouped into top-level PheWAS codes intended to capture clinically meaningful disease categories[22]. We then applied frequency controls to eliminate PheWAS codes occurring in <0.5% of subjects, yielding 480 distinct PheWAS codes. The remaining PheWAS code count by subject matrix was used to fit a latent Dirichlet allocation (LDA) model with 50 topics; the 50 topic count was selected for consistency with our own prior work and in the absence of well-established methods for optimal topic count selection[23]. As we have described[7], this unsupervised machine learning method treats each subject’s medical record as if it were a document composed of PheWAS codes reflecting a mixture of underlying topics, or disease categories. The LDA model that results reflects a distribution of all PheWAS codes over each topic, although most codes contribute only a trivial amount. The fitted topic model was then used to extract topic membership scores for each subject. Topic modeling used R v3.4.3[24,25].

Analysis

Primary analysis examined association between the SNP of interest (rs13107325, at chr4:103188709 in hg19) and each of the 50 topics. Single-locus associations in each genotyping wave were examined individually, and then combined in inverse-variance-weighted fixed-effects meta-analysis in Plink 1.9. Tests for association used linear regression assuming an additive allelic effect treated each topic as a quantitative trait, and adjusted for the first 10 principal components a priori. Secondary analysis examined association between presence/absence of each diagnostic code with loading >0.01 (i.e., 1%) on any topics significant at p < 0.001 (i.e., 0.05/50 topics) with the SNP of interest; these analyses were similarly adjusted for principal components, and then for body mass index (BMI) as well.

Follow-up

To further characterize the risk allele, we examined data from the UK Biobank as analyzed by Neale and colleagues[26] and presented in the Global Biobank Engine (Global Biobank Engine, Stanford, CA; http://gbe.stanford.edu/). We queried rs13107325 to identify health-care codes most strongly enriched in this cohort, then examined the closest-corresponding ICD-10 codes for each individual PheWAS code in our most strongly associated topic.

Results

For 13,577 participants, mean age was 60.5 years (SD 16.1), 7473 (55.1%) were female, and mean BMI was 27.6 (SD 6.1). Among the 50 diagnostic topics, 1 was significantly associated with rs13107325 at an experiment-wide significance threshold (p = 0.00049; beta = 0.0029 for the minor (schizophrenia risk-increasing) allele) (Fig. 1 and Supplemental Table 1).
Fig. 1

Manhattan plot of association between rs13107325 and individual electronic health record-derived topics. Red line indicates experiment-wide significance

Manhattan plot of association between rs13107325 and individual electronic health record-derived topics. Red line indicates experiment-wide significance We next examined the 13 individual codes loading onto this topic with weights ≥0.01 (i.e., presence of a code is associated with 1% increase in probability of belonging in this diagnostic group). Table 1 lists these codes, along with weight in topic and univariate association. In particular, nominally significant univariate associations were observed with intracranial hemorrhage, delirium/dementia, other conditions of the brain, other cerebral degeneration, vertigo, cerebrovascular disease, and developmental disorders. In all cases, the minor (risk) allele is associated with greater risk for the phenotype. Nominal associations persisted after adjustment for BMI, known to be associated with this SNP in prior genome-wide studies (Table 1)[27].
Table 1

Diagnostic codes with weight in topic and univariate association

DiagnosisFrequency p OROR (wave 1)OR (wave 2)OR (wave 3)OR (wave 4)p (BMI-adjusted)OR (BMI-adjusted)ICD-10UKBB p
Intracranial hemorrhage0.0370.00088611.40621.2781.8511.0061.4140.0043961.8189I610.398
Delirium dementia and amnestic disorders0.0800.0013131.2691.1531.3411.2741.3020.0022291.4481F050.913
Other conditions of the brain0.1360.0082651.17371.2191.2161.1361.1120.055711.2068NANA
Other cerebral degenerations0.0200.0121.4782NA1.5481.691.0760.65350.8377NANA
Vertiginous syndromes0.2450.015621.1271.1421.1361.1091.1210.01791.196H810.978
Cerebrovascular disease0.2070.021081.12931.1461.2181.1061.0030.10551.1521I670.584
Developmental delays and disorders0.0220.035311.45451.5281.389NANANANANANA
Neurological disorders due to brain damage0.1540.11071.09841.1611.1531.0061.07
Abnormal movement0.1570.14741.08821.2791.0331.0460.9888
Dysphagia0.1120.26651.07820.96070.83651.2561.255
Other disorders of circulatory system0.0860.28851.08360.97090.9911.0211.454
Visual disturbances0.0760.30321.08660.84311.4150.99221.041
Hemiplegia0.0330.76111.03690.93391.0720.98031.147

Frequency refers to the proportion of individuals across the four study waves with at least one diagnostic code

OR odds ratio, BMI body mass index, NA frequency too low for regression result, ICD-10 10th revision of the International Statistical Classification of Diseases, UKBB United Kingdom Biobank

Diagnostic codes with weight in topic and univariate association Frequency refers to the proportion of individuals across the four study waves with at least one diagnostic code OR odds ratio, BMI body mass index, NA frequency too low for regression result, ICD-10 10th revision of the International Statistical Classification of Diseases, UKBB United Kingdom Biobank For comparison, we examined ICD-10 diagnostic codes associated with the risk allele by querying the UK Biobank association results using the Global Biobank Engine. For nominal p < 1e-03 (the same threshold as applied to our experiment-wide topics), associated diagnoses with increased risk included osteoarthritis (p = 2.06e-11), other joint disorder (2.50e-06), arthritis not otherwise specified (1.12e-05), hiatus hernia (1.47e-05), incontinence (2.14e-05), gastroesophageal reflux (3.58e-05), hay fever/rhinitis (8.03e-05), asthma (1.64e-04), motor neuron disease (4.75e-04), joint pain (5.53e-04), and back pain (6.42e-04). Table 1 also reports associations for the closest ICD-10 code corresponding to the individual PheWAS codes in the primary (intracranial hemorrhage-plus) topic; none of these was nominally associated. Notably, however, no association with schizophrenia codes was identified in either the UK Biobank (via ICD-10) (p = 0.39) or the Partners HealthCare Biobank (p = 0.96).

Discussion

In this analysis of 13,577 individuals of Northern European ancestry in a large hospital-based biobank linked to EHR, we identified a constellation of diagnostic codes associated with a previously reported schizophrenia risk-associated missense variant. The topic is notable for its coherence—i.e., the extent to which nearly all of the associated codes reflect cerebrovascular disease or sequelae—although some of the associated codes (e.g., “other conditions of the brain”) would not necessarily have been identified a priori as informative for analysis. Previous work has suggested that this schizophrenia risk SNP is pleiotropic, associated with multiple genome-wide association phenotypes including BMI[27]. Further, some evidence suggests this locus to be under selection pressure[28,29]. Here we sought to identify a group of codes associated with the variant as a means of better understanding potential pathophysiologic mechanisms. In particular, while the proximal function of the SLC39A8 gene product, ZIP8, is known, the implications of the schizophrenia risk gene are not. Studies of null mutations in rodents suggested important developmental effects[30], while other functional mutations in humans have been associated with disorders of glycosylation[31]. However, the mechanism by which ZIP8 may contribute to schizophrenia risk is unknown. Speculative mechanisms range from immune modulation to metabolic effects and modulation of excitotoxicity via glutamate signaling (for a review, see Costas)[11]. Our results are consistent with both of these, and suggest the utility of investigating the transporter further in stroke-related injury associated with immune activation, where levels of ZIP8 expression have been shown to be high[32]. While we were unable to directly examine PheWAS/ICD-9 code-based topics in the UK Biobank, we did seek to examine the nearest match in individual ICD-10 codes. This analysis does not represent replication per se, as the correspondence between ICD-9 and -10 codes may be poor. Among those ICD-10 codes significantly increased in individuals with the risk allele, the preponderance relate to osteoarthritis and joint pain, which may be sequelae of obesity. Notably, we do not find evidence of replication for individual ICD-9 code associations mapped to ICD-10—nor even of replication of the robust schizophrenia association reported in prior studies (p = 1.54e-12; odds ratio 1.16, SE 0.02). Taken together, these follow-up results underscore the challenges in using single diagnostic codes, particularly when comparing across health systems. They further illustrate the need for additional replication of our topic-based approach. Nonetheless, our results provide further support for the notion that topic-based genome-wide association is a powerful means of addressing the variable reliability of individual diagnostic codes while facilitating phenome-wide investigation, or simply reverse genomics[7]. It provides control of type I error by limiting the number of phenotypes tested, such that only topics achieving experiment-wide association require further investigation. In prior work, we demonstrated that under most scenarios, power to detect association will be greater with this approach; the exception is circumstances where a single diagnostic code captures essentially all of the relevant variance associated with a variant. The extrapolation from GWAS results to biology remains a great challenge, particularly for brain diseases where model systems may be more limited. Nonetheless, if the promise of modern genomics is to be fulfilled, bridging this gap is necessary, particularly to enable development of pharmacologic interventions tied to genomics as has been done in other disorders[1]. For schizophrenia, investigating the ZIP8 locus in large biobanks may help to complement and extend efforts in cellular and animal models to understand this complex disease.

Disclaimer

The sponsor had no role in study design, writing of the report, or data collection, analysis, or interpretation. The corresponding and senior authors had full access to all data and made the decision to submit for publication.
  25 in total

1.  PLINK: a tool set for whole-genome association and population-based linkage analyses.

Authors:  Shaun Purcell; Benjamin Neale; Kathe Todd-Brown; Lori Thomas; Manuel A R Ferreira; David Bender; Julian Maller; Pamela Sklar; Paul I W de Bakker; Mark J Daly; Pak C Sham
Journal:  Am J Hum Genet       Date:  2007-07-25       Impact factor: 11.025

2.  minimac2: faster genotype imputation.

Authors:  Christian Fuchsberger; Gonçalo R Abecasis; David A Hinds
Journal:  Bioinformatics       Date:  2014-10-22       Impact factor: 6.937

3.  Enhancing Delirium Case Definitions in Electronic Health Records Using Clinical Free Text.

Authors:  Thomas H McCoy; Deanna C Chaukos; Leslie A Snapper; Kamber L Hart; Theodore A Stern; Roy H Perlis
Journal:  Psychosomatics       Date:  2016-10-27       Impact factor: 2.386

Review 4.  Phenome-Wide Association Studies as a Tool to Advance Precision Medicine.

Authors:  Joshua C Denny; Lisa Bastarache; Dan M Roden
Journal:  Annu Rev Genomics Hum Genet       Date:  2016-05-04       Impact factor: 8.929

5.  SLC39A8 Deficiency: A Disorder of Manganese Transport and Glycosylation.

Authors:  Julien H Park; Max Hogrebe; Marianne Grüneberg; Ingrid DuChesne; Ava L von der Heiden; Janine Reunert; Karl P Schlingmann; Kym M Boycott; Chandree L Beaulieu; Aziz A Mhanni; A Micheil Innes; Konstanze Hörtnagel; Saskia Biskup; Eva M Gleixner; Gerhard Kurlemann; Barbara Fiedler; Heymut Omran; Frank Rutsch; Yoshinao Wada; Konstantinos Tsiakas; René Santer; Daniel W Nebert; Stephan Rust; Thorsten Marquardt
Journal:  Am J Hum Genet       Date:  2015-12-03       Impact factor: 11.025

Review 6.  10 Years of GWAS Discovery: Biology, Function, and Translation.

Authors:  Peter M Visscher; Naomi R Wray; Qian Zhang; Pamela Sklar; Mark I McCarthy; Matthew A Brown; Jian Yang
Journal:  Am J Hum Genet       Date:  2017-07-06       Impact factor: 11.025

7.  Validation of electronic health record phenotyping of bipolar disorder cases and controls.

Authors:  Victor M Castro; Jessica Minnier; Shawn N Murphy; Isaac Kohane; Susanne E Churchill; Vivian Gainer; Tianxi Cai; Alison G Hoffnagle; Yael Dai; Stefanie Block; Sydney R Weill; Mireya Nadal-Vicens; Alisha R Pollastri; J Niels Rosenquist; Sergey Goryachev; Dost Ongur; Pamela Sklar; Roy H Perlis; Jordan W Smoller
Journal:  Am J Psychiatry       Date:  2014-12-12       Impact factor: 18.112

8.  Cryptic distant relatives are common in both isolated and cosmopolitan genetic samples.

Authors:  Brenna M Henn; Lawrence Hon; J Michael Macpherson; Nick Eriksson; Serge Saxonov; Itsik Pe'er; Joanna L Mountain
Journal:  PLoS One       Date:  2012-04-03       Impact factor: 3.240

9.  Biological insights from 108 schizophrenia-associated genetic loci.

Authors: 
Journal:  Nature       Date:  2014-07-22       Impact factor: 49.962

10.  Signatures of Evolutionary Adaptation in Quantitative Trait Loci Influencing Trace Element Homeostasis in Liver.

Authors:  Johannes Engelken; Guadalupe Espadas; Francesco M Mancuso; Nuria Bonet; Anna-Lena Scherr; Victoria Jímenez-Álvarez; Marta Codina-Solà; Daniel Medina-Stacey; Nino Spataro; Mark Stoneking; Francesc Calafell; Eduard Sabidó; Elena Bosch
Journal:  Mol Biol Evol       Date:  2015-11-17       Impact factor: 16.240

View more
  8 in total

Review 1.  Divalent Metal Uptake and the Role of ZIP8 in Host Defense Against Pathogens.

Authors:  Derrick R Samuelson; Sabah Haq; Daren L Knoell
Journal:  Front Cell Dev Biol       Date:  2022-06-27

2.  N-glycome analysis detects dysglycosylation missed by conventional methods in SLC39A8 deficiency.

Authors:  Julien H Park; Robert G Mealer; Abdallah F Elias; Susanne Hoffmann; Marianne Grüneberg; Saskia Biskup; Manfred Fobker; Jaclyn Haven; Ute Mangels; Janine Reunert; Stephan Rust; Jonathan Schoof; Corbin Schwanke; Jordan W Smoller; Richard D Cummings; Thorsten Marquardt
Journal:  J Inherit Metab Dis       Date:  2020-09-14       Impact factor: 4.982

3.  Distribution of agitation and related symptoms among hospitalized patients using a scalable natural language processing method.

Authors:  Kamber L Hart; Amelia M Pellegrini; Brent P Forester; Sabina Berretta; Shawn N Murphy; Roy H Perlis; Thomas H McCoy
Journal:  Gen Hosp Psychiatry       Date:  2020-11-10       Impact factor: 3.238

Review 4.  Pleiotropy and Cross-Disorder Genetics Among Psychiatric Disorders.

Authors:  Phil H Lee; Yen-Chen A Feng; Jordan W Smoller
Journal:  Biol Psychiatry       Date:  2020-10-10       Impact factor: 13.382

5.  Schizophrenia-associated SLC39A8 polymorphism is a loss-of-function allele altering glutamate receptor and innate immune signaling.

Authors:  Wei Chou Tseng; Veronica Reinhart; Thomas A Lanz; Mark L Weber; Jincheng Pang; Kevin Xuong Vinh Le; Robert D Bell; Patricio O'Donnell; Derek L Buhl
Journal:  Transl Psychiatry       Date:  2021-02-19       Impact factor: 6.222

6.  The Impact of ZIP8 Disease-Associated Variants G38R, C113S, G204C, and S335T on Selenium and Cadmium Accumulations: The First Characterization.

Authors:  Zhan-Ling Liang; Heng Wee Tan; Jia-Yi Wu; Xu-Li Chen; Xiu-Yun Wang; Yan-Ming Xu; Andy T Y Lau
Journal:  Int J Mol Sci       Date:  2021-10-22       Impact factor: 5.923

Review 7.  Impact of Zinc Transport Mechanisms on Embryonic and Brain Development.

Authors:  Jeremy Willekens; Loren W Runnels
Journal:  Nutrients       Date:  2022-06-17       Impact factor: 6.706

Review 8.  SLC39A8 gene encoding a metal ion transporter: discovery and bench to bedside.

Authors:  Daniel W Nebert; Zijuan Liu
Journal:  Hum Genomics       Date:  2019-09-14       Impact factor: 4.639

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.