Literature DB >> 33307186

A population scale analysis of rare SNCA variation in the UK Biobank.

Cornelis Blauwendraat¹, Mary B Makarious², Hampton L Leonard³, Sara Bandres-Ciga⁴, Hirotaka Iwaki³, Mike A Nalls³, Alastair J Noyce⁵, Andrew B Singleton⁴.

Abstract

Parkinson's disease (PD) is a complex neurodegenerative disease with a variety of genetic and environmental factors contributing to disease. The SNCA gene encodes for the alpha-synuclein protein which plays a central role in PD, where aggregates of this protein are one of the pathological hallmarks of disease. Rare point mutations and copy number gains of the SNCA gene have been shown to cause autosomal dominant PD, and common DNA variants identified using Genome-Wide Association Studies (GWAS) are a moderate risk factor for PD. The UK Biobank is a large-scale population prospective study including ~500,000 individuals that has revolutionized human genetics. Here we assessed the frequency of SNCA variation in this cohort and identified 30 subjects carrying variants of interest including duplications (n = 6), deletions (n = 6) and large complex likely mosaic events (n = 18). No known pathogenic missense variants were identified. None of these subjects were reported to be a PD case, although it is possible that these individuals may develop PD at a later age, and whilst three had known prodromal features, these did not meet defined clinical criteria for being considered 'prodromal' cases. Seven of the 18 large complex carriers showed a history of blood based cancer. Overall, we identified copy number variants in the SNCA region in a large population based cohort without reported PD phenotype and symptoms. Putative mosaicism of the SNCA gene was identified, however, it is unclear whether it is associated with PD. These individuals are potential candidates for further investigation by performing SNCA RNA and protein expression studies, as well as promising clinical trial candidates to understand how duplication carriers potentially escape PD. Published by Elsevier Inc.

Entities: Chemical

Keywords: Alpha-synuclein; Copy number variants; Mosaicism; Parkinson's disease; SNCA

Mesh：

Substances：

Year: 2020 PMID： 33307186 PMCID： PMC7880248 DOI： 10.1016/j.nbd.2020.105182

Source DB: PubMed Journal: Neurobiol Dis ISSN： 0969-9961 Impact factor: 5.996

Introduction

Parkinson’s disease (PD) often has a genetic component, either via rare damaging variants, more common risk variants, or a complex combination of these. Rare missense variants and copy number gains of the SNCA gene have been shown to cause autosomal dominant PD (Singleton et al., 2003; Polymeropoulos et al., 1997; Chartier-Harlin et al., 2004). Additionally, several non-coding common variants have been identified to moderately increase risk for PD (Nalls et al., 2019) and modify age at onset (Blauwendraat et al., 2019) with a disease mechanism that likely implies an increased expression of SNCA levels (Pihlstrøm et al., 2018; Soldner et al., 2016). Importantly, SNCA encodes for the alpha-synuclein protein, and aggregates of this protein have been identified in Lewy bodies, which is one of the major histopathological hallmarks of PD. Overall, this puts SNCA at a central position in PD pathogenesis, and currently there are potential disease-modifying therapies under development aimed at lowering SNCA levels. Pathogenic missense SNCA variants are very rare in the general population with five disease-causing mutations (A30P, E46K, G51D, A53E and A53T). SNCA copy number variants, although still rare, appear to be more frequent compared to pathogenic missense variants with approximately 60 families reported to date. More rapid progression and atypical PD presentation is fairly common for all SNCA mutation carriers, with increased prevalence of symptoms like: dementia, rapid eye movement sleep behavior disorder, and autonomic dysfunction. Age of onset for SNCA missense variant and copy number variant carriers is earlier compared to idiopathic PD, with a broad range of age of onset in which late forties/early fifties tend to be the most frequent (Book et al., 2018; Konno et al., 2016). Besides these autosomal dominant inheritance patterns, SNCA copy number gains also have been reported via mosaic patterns (Perez-Rodriguez et al., 2019; Mokretar et al., 2018), although larger studies are needed to confirm whether this is a general PD pathogenesis mechanism. Here, we assess the presence and frequency of pathogenic SNCA variants (single nucleotide variants and copy number variants) in the population scale UK Biobank cohort (Bycroft et al., 2018).

Methods

UK Biobank genotype data

UK Biobank B allele frequency (BAF) and Log2 Ratio (L2R) were downloaded from the UK Biobank (v2) (June 2020) containing data of 488,377 individuals. BAF and L2R values were extracted for the larger SNCA gene region (chr4:84992449–96412248, hg19), and these parameters were investigated considering three SNCA regions 1) larger SNCA region gene ±5 Mb (chr4:85645250–95759466, hg19), 2) SNCA gene ±0.5 Mb (chr4:90145250–91259466, hg19), 3) SNCA gene body (chr4:90645250–90759466, hg19). For additional wider chromosome exploration, the SNCA region gene ±20 Mb (chr4:70645250–110759466, hg19) and the full chromosome 4 region were used (chr4:1–180915260, hg19). For samples with large genomic events, the remaining autosomes were further inspected l. Typically, BAF values ranging between −0.66 and − 0.33 correspond to potential duplications. As part of our approach, BAF values <0.85 and >0.65 + <0.35 and >0.15 were counted for each of the three genetic regions and if the variant count was higher than 6, the sample was plotted and visually inspected. L2R values were averaged for each of the three genetic regions, where high average values would correspond to potential duplications and low average values would correspond to a potential deletion. After selecting subjects of interest, a total of 363 individuals were manually inspected of which 30 showed an event of interest including potential duplications, deletions and more complex events. BAF and L2R values were plotted in R (https://www.r-project.org/) using ggplot2, and averages were calculated using the LOWESS (Locally Weighted Scatterplot Smoothing) function with variable smoother span based on size of total graph (ranging from 0.01 to 0.0001). UK Biobank genotype data (v2) was used to calculate relatedness using PLINK (v1.9) (Chang et al., 2015).

Exome sequencing data

UK Biobank exome sequencing data (FE dataset, field codes: 23160 and 23,161, June 2020) were downloaded from the UK Biobank containing data of 49,960 individuals. Variants were annotated using ANNOVAR (Wang et al., 2010). A comprehensive assessment of known pathogenic variants including A30P, E46K, G51D, A53E and A53T was performed. Furthermore, we screened previously reported pathogenic variants from both ClinVar (https://www.ncbi.nlm.nih.gov/clinvar/) and the Human Gene Mutation Database (HGMD) (Professional 2020.2, Qiagen), as well as potentially relevant loss-of-function variants (stop, frameshift, splicing). Exome sequencing (OQFE dataset, field codes: 23151, December 2020) was used as replication of genomic events using the allelic depth (AD) field included in the individual’s VCF files. AD ratio was calculated by dividing the highest by the lowest depth value and subsequently plotted using ggplot2 in R (Wickham, 2016). A total of ten random samples were selected as negative controls. Additional population allele frequencies were obtained from the gnomAD population database (https://gnomad.broadinstitute.org/, v2.1.1, June 2020) (Karczewski et al., 2020).

UK Biobank phenotype data

UK Biobank phenotype data was obtained from ICD10 codes (field code: 41270), PD (field code: 131023), illnesses of father and mother (field codes: 20107 and 20,110), parkinsonism (field code: 42031) or dementia (field code: 42018), genetic ethnic grouping (field code: 22006), year of birth (field code: 34) and age of recruitment (field code: 21022). The MDS prodromal criteria were used to assess a potential PD prodromal phenotype (Heinzel et al., 2019).

Data and code availability

All data used as part of this study is publically available upon application at the UK Biobank website (https://www.ukbiobank.ac.uk/) and gnomAD database (https://gnomad.broadinstitute.org/). Code used for both preprocessing and subsequent analyses is available on our GitHub repository https://github.com/neurogenetics/UKbiobank_SNCA.

Results

Here we assessed the presence and frequency of potentially damaging SNCA mutations in the UK Biobank large-scale population level cohort. None of the five known SNCA pathogenic variants (A30P, E46K, G51D, A53E and A53T) were identified in the UK Biobank exome sequencing data. Two SNCA variants with conflicting or unconfirmed pathogenicity were detected (Sup Table 1) with relatively similar frequencies as in the gnomAD database. None of the variant carriers was a reported PD case, however, three showed a positive family history for PD. Thirteen subjects carried the SNCA P117T variant and one of these had a parent affected with PD. Additionally, a total of 28 subjects carried the SNCA H50Q variant of which two had a parent affected with PD. Note that none of these observations showed statistically significant results (p >0.3, 2 ×2 Fisher exact test, Sup Table 4). Next, we explored the presence and frequency of SNCA copy number variants by assessing BAF and L2R values in the UK Biobank genotyping data. A total of 30 carriers were identified and categorized into three groups, duplication carriers (n = 6), deletion carriers (n = 6) and large “complex” events carriers (n = 18). Large complex events carriers were subjects where the BAF and L2R values did not meet the criteria for a “small” duplication or deletion but rather showed evidence of a large mosaicism event (Sup Table 2). Six potential SNCA duplications were identified (Fig. 1), of which all showed clear BAF and L2R changes implying a SNCA duplication. The average estimated size was −2 Mb, ranging from −0.8 Mb to −6 Mb. Estimated breaking points are reported in Sup Table 2. Furthermore, six potential SNCA deletions were identified of which five were complete gene deletions and one likely partial gene deletion (covering transcription start site and first exons) but sufficient to likely result in haploinsufficiency (Fig. 2). The average estimated size was −0.76 Mb and ranging from −0.2 Mb to −1 Mb. Besides the identification of relatively straightforward duplications and deletions, 18 complex events were further detected (Sup. Fig. 1). Inspection of the wider SNCA (±20 Mb), the full chromosome 4, as well as the remaining autosomes, suggested that most events are likely due to mosaicism mostly spanning the majority of the long-arm of chromosome 4 (4q11-4q35). Notably, one individual (#comp16) carried an additional large event on chromosome 20. After closer inspection, the majority were classified as large complex events with likely varying levels of mosaicism, of which two were likely mosaic deletions due to a uniparental disomy event (Sup. Fig. 1).

Fig. 1.

SNCA whole gene duplications identified in the UK Biobank cohort.

Fig. 2.

SNCA whole gene deletions and one (likely) partial SNCA deletion (D) identified in the UK Biobank cohort.

As partial validation of the results obtained from the genotyping array data, we explored exome sequencing allelic depth data (sequencing depth of each allele) of all heterozygous variants on chromosome 4. Exome sequencing was available for 14 of the 30 subjects (3 from UK Biobank field code 23161 and 14 from field code 23151) with a potential SNCA alteration (Sup. Table 2). All large events (>6 Mb) showed clear differences of allelic depth in the expected areas of genomic events (Fig. 3 and Sup. Fig. 2). However, the smaller events <2 Mb did not showed clear differences of allelic depth likely due to limited useable variants in these regions due to exome sequencing coverage (Sup. Fig. 2). We used 10 random subjects from the exome sequencing data as negative controls, of which none showed evidence of an allelic imbalance (Sup. Fig. 3).

Fig. 3.

Partial validation of genomic events of the genotyping array data using exome sequencing data. A) Duplication of the SNCA gene region (subject #dup3), B) Partial duplication of the long arm of chromosome 4 (subject #comp8), C) Partial duplication of the long arm of chromosome 4 (subject #comp17) Red and blue vertical lines represents the SNCA gene body.

When assessing the UK Biobank phenotypic data of the subjects with a potential SNCA alteration, none of them had been reported to be affected with PD, parkinsonism or dementia. Further inspection of ICD10 codes did not result in a potential PD diagnosis, although we cannot discard the possibility that some may be prodromal PD cases based on non-specific symptoms which are common in the older general population, including depression, anxiety, syncope, and head injury. Three participants were further evaluated to ascertain the probability of them being prodromal cases according to the recently revised MDS Prodromal Criteria. None of the participants met the clinical threshold criteria for probable prodromal PD (see Sup. Table 3). Of the 24 subjects of interest, two had a parent affected with dementia (#comp3 and #comp6) and one had a parent with PD (#comp6). Note that none of these observations showed statistically significant results (p > 0.1, Chi-square test). Of note, seven carriers (all large complex events) of the 18 (39%) potential mosaic subjects showed a history of blood-based cancer (including: lymphoma, multiple myeloma or myelodysplastic syndrome, Sup. Table 2). Average age of recruitment of SNCA deletion carriers was 52 (range 41–56), for SNCA duplication carriers average age of recruitment was 61 (range 49–69) (all European) and for SNCA complex event carriers average age was 61 (range 42–69) of which two were not grouped as European. Interestingly, two pairs of individuals are siblings based on their relatedness value (PIHAT −0.5) with reported birth years having −2 years difference. Pair one (subject #dup1 and #dup6) carry both a 1.5 Mb SNCA duplication, and pair two (subject #del2 and #del5) carry both a 0.8 Mb SNCA deletion.

Discussion

SNCA missense and copy number gains are a clear cause of autosomal dominant PD. Here we assessed the UK Biobank cohort for potential pathogenic variants and genetic rearrangements in SNCA. None of the five known missense mutations were identified. Several other SNCA missense variants with conflicting pathogenicity reports were identified for which none of the carriers was reported to be a PD case. Although three individuals did report a positive family history for PD, the allele frequency of these variants was too low to robustly test whether this frequency is more than expected by chance. Note that we previously suggested that the SNCA H50Q variant is likely not pathogenic in an autosomal dominant inheritance fashion (Blauwendraat et al., 2018) despite the presence of functional evidence (Boyer et al., 2019; Ruf et al., 2019). We identified 6 copy number gains, 6 copy number loss and 18 complex SNCA events. Surprisingly, none of these carriers have been reported to be PD cases and although three had ICD10 codes that relate to recognised prodromal PD symptoms, none met published probability thresholds to be considered a prodromal case (Heinzel et al., 2019). Separately, one SNCA deletion carrier had an ICD10 code for schizophrenia which has been suggested to be a prodromal symptom in SNCA duplication carriers (Takamura et al., 2016). However, while SNCA copy number gains are a clear cause of autosomal dominant PD, SNCA copy number loss (deletions) variants are, to our knowledge, not reported to cause PD. One possible explanation for the lack of PD status in these subjects is that they will develop PD at a later stage. The age of recruitment for the six duplication carriers is 61 (range 49–69). A large recent meta-analysis of duplication carriers showed an average age of onset of 46.9 with a range of 30–73 (Book et al., 2018), suggesting that these subjects could develop PD at a later age. The identification of 18 potential mosaic events is of interest, especially given the prior evidence of SNCA mosaicism in a PD case (Pankratz et al., 2011) and mosaicism in synucleinopathies brain tissue (Perez-Rodriguez et al., 2019; Mokretar et al., 2018). Fourteen of the identified genetic rearrangements were classified as large complex events (100 + Mb) with likely varying levels of mosaicism (Conlin et al., 2010). Three were likely mosaic deletions of the larger SNCA region with 20–50 Mb size, while one was likely a uniparental disomy event spanning the majority of the long arm of chromosome 4. In the UK Biobank data, clonal hematopoiesis,mosaic effects and its association with cancer have already been shown to be present on a larger scale than previously assumed (Loh et al., 2018; Tuke et al., 2020; Loh et al., 2020; Dawoud et al., 2020). Therefore, not surprisingly, seven (all large complex events) of the 18 (39%) potential mosaic subjects showed a history of blood-based cancer. It is important to note that the current data is based on blood derived DNA, and it is unclear whether these effects are also present in more PD related tissues and cells like the brain due to potential clonal hematopoiesis. Besides it is unclear what the effect of these events is on SNCA gene expression and a-synuclein protein expression. A deeper investigation of these genomic events is needed before robust disease associations can be made. Although our findings are based on a population level cohort, there are several clear limitations. First, the gold standard of copy number variant detection is multiplex ligation-dependent probe amplification (MLPA). However, DNA of these subjects is not readily available making validation of these genomic events not possible. In an attempt to overcome this limitation, we partially validated a subset of the larger genomic events using the available exome sequencing data in which clear events were detected, although gene and protein expression investigations would be necessary to confirm the functional consequences. Unfortunately the smaller genomic events were not validated in the exome sequencing data likely due to limit of coverage in the smaller regions. More future extensive validation is needed to determine the exact size and downstream effects of these variants, especially of the complex mosaic events. Second, although the UK Biobank is the largest and most comprehensive genotype/phenotype cohort to date there is the potential of missing phenotype data when using electronic health record (EHR) ICD10 codes for various reasons. Therefore, although PD or a prodromal PD phenotype could have been missed in some instances, it is unlikely that this information is missing in all carriers of potential duplication events. Third, our approach is partly based on manual interpretation of prioritized subjects and is biased towards larger events and smaller deletions (i.e. single exon) or complex rearrangements are easily missed due to the genotyping array design (spacing and number of variants in the SNCA gene region). Although manual interpretation is not ideal, it is more likely to be conservative than automatically generated algorithms since these often also require a manual interpretation step. Of note, multiple copy number variant events were detected outside of the SNCA gene body region which can have a significant impact on expression. To confirm this, RNA sequencing data would be needed. Overall, here we identified that copy number variants and mosaicism in the SNCA region are present in the general population without reported PD symptoms. These subjects are outstanding candidates for more thorough investigation of SNCA levels, general SNCA biology, potential clinical trial candidates and how these duplication carriers potentially escape PD. Mosaicism of the SNCA gene region in blood derived genotype data is of interest, although this mechanism needs to be explored on a larger case-control scale to assess whether this is associated with disease or not.

26 in total

1. Mutation in the alpha-synuclein gene identified in families with Parkinson's disease.

Authors: M H Polymeropoulos; C Lavedan; E Leroy; S E Ide; A Dehejia; A Dutra; B Pike; H Root; J Rubenstein; R Boyer; E S Stenroos; S Chandrasekharappa; A Athanassiadou; T Papapetropoulos; W G Johnson; A M Lazzarini; R C Duvoisin; G Di Iorio; L I Golbe; R L Nussbaum
Journal: Science Date: 1997-06-27 Impact factor: 47.728

2. Copy number variation in familial Parkinson disease.

Authors: Nathan Pankratz; Alexandra Dumitriu; Kurt N Hetrick; Mei Sun; Jeanne C Latourelle; Jemma B Wilk; Cheryl Halter; Kimberly F Doheny; James F Gusella; William C Nichols; Richard H Myers; Tatiana Foroud; Anita L DeStefano
Journal: PLoS One Date: 2011-08-02 Impact factor: 3.240

3. Different Effects of α-Synuclein Mutants on Lipid Binding and Aggregation Detected by Single Molecule Fluorescence Spectroscopy and ThT Fluorescence-Based Measurements.

Authors: Viktoria C Ruf; Georg S Nübling; Sophia Willikens; Song Shi; Felix Schmidt; Johannes Levin; Kai Bötzel; Frits Kamp; Armin Giese
Journal: ACS Chem Neurosci Date: 2019-01-16 Impact factor: 4.418

4. Mechanisms of mosaicism, chimerism and uniparental disomy identified by single nucleotide polymorphism array analysis.

Authors: Laura K Conlin; Brian D Thiel; Carsten G Bonnemann; Livija Medne; Linda M Ernst; Elaine H Zackai; Matthew A Deardorff; Ian D Krantz; Hakon Hakonarson; Nancy B Spinner
Journal: Hum Mol Genet Date: 2010-01-06 Impact factor: 6.150

5. Alpha-synuclein locus duplication as a cause of familial Parkinson's disease.

Authors: Marie-Christine Chartier-Harlin; Jennifer Kachergus; Christophe Roumier; Vincent Mouroux; Xavier Douay; Sarah Lincoln; Clotilde Levecque; Lydie Larvor; Joris Andrieux; Mary Hulihan; Nawal Waucquier; Luc Defebvre; Philippe Amouyel; Matthew Farrer; Alain Destée
Journal: Lancet Date: 2004 Sep 25-Oct 1 Impact factor: 79.321

6. Schizophrenia as a prodromal symptom in a patient harboring SNCA duplication.

Authors: Shogo Takamura; Aya Ikeda; Kenya Nishioka; Hirokazu Furuya; Mari Tashiro; Takashi Matsushima; Yuanzhe Li; Hiroyo Yoshino; Manabu Funayama; Shigeru Morinobu; Nobutaka Hattori
Journal: Parkinsonism Relat Disord Date: 2016-02-08 Impact factor: 4.891

7. Identification of novel risk loci, causal insights, and heritable risk for Parkinson's disease: a meta-analysis of genome-wide association studies.

Authors: Mike A Nalls; Cornelis Blauwendraat; Costanza L Vallerga; Karl Heilbron; Sara Bandres-Ciga; Diana Chang; Manuela Tan; Demis A Kia; Alastair J Noyce; Angli Xue; Jose Bras; Emily Young; Rainer von Coelln; Javier Simón-Sánchez; Claudia Schulte; Manu Sharma; Lynne Krohn; Lasse Pihlstrøm; Ari Siitonen; Hirotaka Iwaki; Hampton Leonard; Faraz Faghri; J Raphael Gibbs; Dena G Hernandez; Sonja W Scholz; Juan A Botia; Maria Martinez; Jean-Christophe Corvol; Suzanne Lesage; Joseph Jankovic; Lisa M Shulman; Margaret Sutherland; Pentti Tienari; Kari Majamaa; Mathias Toft; Ole A Andreassen; Tushar Bangale; Alexis Brice; Jian Yang; Ziv Gan-Or; Thomas Gasser; Peter Heutink; Joshua M Shulman; Nicholas W Wood; David A Hinds; John A Hardy; Huw R Morris; Jacob Gratten; Peter M Visscher; Robert R Graham; Andrew B Singleton
Journal: Lancet Neurol Date: 2019-12 Impact factor: 44.182

8. Investigation of somatic CNVs in brains of synucleinopathy cases using targeted SNCA analysis and single cell sequencing.

Authors: Diego Perez-Rodriguez; Maria Kalyva; Melissa Leija-Salazar; Tammaryn Lashley; Maxime Tarabichi; Viorica Chelban; Steve Gentleman; Lucia Schottlaender; Hannah Franklin; George Vasmatzis; Henry Houlden; Anthony H V Schapira; Thomas T Warner; Janice L Holton; Zane Jaunmuktane; Christos Proukakis
Journal: Acta Neuropathol Commun Date: 2019-12-23 Impact factor: 7.801

9. Monogenic and polygenic inheritance become instruments for clonal selection.

Authors: Po-Ru Loh; Giulio Genovese; Steven A McCarroll
Journal: Nature Date: 2020-06-24 Impact factor: 49.962

10. The mutational constraint spectrum quantified from variation in 141,456 humans.

Authors: Konrad J Karczewski; Laurent C Francioli; Grace Tiao; Beryl B Cummings; Jessica Alföldi; Qingbo Wang; Ryan L Collins; Kristen M Laricchia; Andrea Ganna; Daniel P Birnbaum; Laura D Gauthier; Harrison Brand; Matthew Solomonson; Nicholas A Watts; Daniel Rhodes; Moriel Singer-Berk; Eleina M England; Eleanor G Seaby; Jack A Kosmicki; Raymond K Walters; Katherine Tashman; Yossi Farjoun; Eric Banks; Timothy Poterba; Arcturus Wang; Cotton Seed; Nicola Whiffin; Jessica X Chong; Kaitlin E Samocha; Emma Pierce-Hoffman; Zachary Zappala; Anne H O'Donnell-Luria; Eric Vallabh Minikel; Ben Weisburd; Monkol Lek; James S Ware; Christopher Vittal; Irina M Armean; Louis Bergelson; Kristian Cibulskis; Kristen M Connolly; Miguel Covarrubias; Stacey Donnelly; Steven Ferriera; Stacey Gabriel; Jeff Gentry; Namrata Gupta; Thibault Jeandet; Diane Kaplan; Christopher Llanwarne; Ruchi Munshi; Sam Novod; Nikelle Petrillo; David Roazen; Valentin Ruano-Rubio; Andrea Saltzman; Molly Schleicher; Jose Soto; Kathleen Tibbetts; Charlotte Tolonen; Gordon Wade; Michael E Talkowski; Benjamin M Neale; Mark J Daly; Daniel G MacArthur
Journal: Nature Date: 2020-05-27 Impact factor: 69.504