Literature DB >> 33075204

Long genes are more frequently affected by somatic mutations and show reduced expression in Alzheimer's disease: Implications for disease etiology.

Sourena Soheili-Nezhad1, Robert J van der Linden2, Marcel Olde Rikkert3,4, Emma Sprooten1, Geert Poelmans2.   

Abstract

Aging, the greatest risk factor for Alzheimer's disease (AD), may lead to the accumulation of somatic mutations in neurons. We investigated whether somatic mutations, specifically in longer genes, are implicated in AD etiology. First, we modeled the theoretical likelihood of genes being affected by aging-induced somatic mutations, dependent on their length. We then tested this model and found that long genes are indeed more affected by somatic mutations and that their expression is more frequently reduced in AD brains. Furthermore, using gene-set enrichment analysis, we investigated the potential consequences of such long gene disruption. We found that long genes are involved in synaptic adhesion and other synaptic pathways that are predicted to be inhibited in the brains of AD patients. Taken together, our findings indicate that long gene-dependent synaptic impairment may contribute to AD pathogenesis.
© 2020 The Authors. Alzheimer's & Dementia published by Wiley Periodicals, Inc. on behalf of Alzheimer's Association.

Entities:  

Keywords:  Alzheimer's disease; DNA damage; long genes; somatic mutations; synaptic adhesion

Year:  2020        PMID: 33075204      PMCID: PMC8048495          DOI: 10.1002/alz.12211

Source DB:  PubMed          Journal:  Alzheimers Dement        ISSN: 1552-5260            Impact factor:   21.566


NARRATIVE

Dementia affects 50 million people worldwide, and Alzheimer's disease (AD) is the most common form of dementia, accounting for two‐thirds of all cases. AD is a neurodegenerative disease that is characterized by a decline in memory and cognitive function. Symptoms worsen, become increasingly diverse and more impairing with age, and AD causes much distress for patients and their loved ones. In the best case, current treatments provide some symptom relief and give patients and their families more time to prepare for the inevitably declining disease trajectory. , Thus far, efforts to develop disease‐modifying medications for AD have unfortunately been unsuccessful due to lacking or incomplete knowledge of the (disturbed) biological processes underlying the disease. New insights into AD pathogenesis for better treatment development are therefore urgently needed. In this article, we propose a new neurobiological mechanism underlying AD, namely, that somatic mutations that accumulate with aging especially affect long genes and lead to decreased long gene expression, which in turn results in disturbed synaptic function. We provide evidence for this hypothesis through analyzing publicly available somatic mutation and gene expression data that were generated and generously provided by other researchers, which therefore did not require any tissue processing from our side. We first give a short overview of the genetic risk factors associated with AD. Subsequently, we introduce and validate a theoretical model of how aging, the most important risk factor for AD, is associated with the accumulation of somatic mutations, particularly in longer genes. Then, we demonstrate that in AD brains, longer genes are more frequently affected by somatic mutations and show a reduced expression, which is predicted to lead to synaptic impairment. Finally, we make suggestions for future research that arise from these insights.

Inherited risk factors for AD

In a small percentage of AD cases, a clear monogenic cause is present. People carrying rare pathogenic variants (or mutations) in one of three genes—APP, PSEN1, or PSEN2—have a dominantly inherited form of AD with an early age at onset (<65 years). Conversely, late‐onset AD (LOAD, ≥ 65 years) represents the vast majority of AD cases and has a multifactorial etiology: It is caused by the cumulative effect of multiple genetic risk factors combined with lifestyle and environmental factors. The strongest common genetic risk factor for LOAD is the ε4 allele of the apolipoprotein E (APOE) gene (APOEε4). , In addition, an increasing number of inherited common and rare risk variants have been associated with LOAD through so‐called genome‐wide association studies (GWASs) and whole exome/genome sequencing studies, respectively. , , Typically, AD candidate genes and their possible involvement in disease progression have been interpreted in the framework of early onset AD mutations and the main pathological hallmarks of both early and late‐onset AD, that is, the development of extracellular plaques containing amyloid beta (Aβ) and intracellular tangles of hyperphosphorylated tau protein.

RESEARCH IN CONTEXT

Systematic review: The occurrence of somatic mutations with increasing age, the greatest risk factor for Alzheimer's disease (AD), has been hypothesized to occur in the brain and hence contribute to AD pathogenesis. We searched the literature (PubMed) for studies that used sequencing techniques to identify aging‐associated somatic mutations in brain regions and individual neurons of AD patients and healthy controls. Interpretation: We found that aging‐associated somatic mutations in the brain more often affect longer genes. These long genes show reduced expression in AD brains and encode proteins that are involved in synaptic pathways that are inhibited in AD‐relevant brain regions, especially the hippocampus. Future directions: To add to our understanding of the effect of long gene disruptions in AD, additional studies are needed in which both RNA sequencing and somatic mutation analysis would be conducted in single neurons from post‐mortem AD hippocampal tissue.

Aging‐induced somatic mutations in AD

Increasing age is the greatest risk factor for AD, but the (causal) mechanisms through which aging may lead to the development of plaques and tangles and clinical deterioration in patients are incompletely understood. Of interest, somatic mutations—that is non‐inherited genetic variants that only appear in a person's cells (eg, neurons in the brain) throughout his/her lifetime and are not transmitted to future generations—increase with age. In this respect, whole‐genome sequencing of individual neurons from the dentate gyrus, a part of the hippocampus that is the most affected brain region in AD, has recently shown that (healthy) aging of the brain is associated with the accumulation of somatic mutations—that is, somatic single nucleotide variants (sSNVs)—at a more or less linear rate of approximately 40 sSNVs per neuron per year. , This type of DNA damage appears to accumulate in a random manner with increasing age. , If the burden of sSNVs is uniformly scattered at purely random genomic positions, genes spanning longer portions of the genome are expected to accumulate more sSNVs than genes that are smaller in size. Therefore, this age‐related accumulation of sSNVs may disproportionally affect longer genes. Beyond the effects of healthy aging, a higher degree or acceleration of sSNVs can have neuropathological effects. For example, in Cockayne syndrome, a disease associated with brain atrophy and cognitive decline, patients are affected by higher rates of sSNVs due to impaired DNA repair, , and this is especially the case for slow‐ or non‐proliferative cells such as neurons. Although such rare genetic syndromes represent an extreme form of DNA repair dysfunction, they clearly show that genomic maintenance is an active process in neurons. There is substantial evidence for deficiencies in DNA repair in AD as well (reviewed in ). Analysis of sSNVs in the hippocampus has also revealed both clock‐like and oxidative stress–induced signatures, suggesting that there are factors that increase the total mutational burden over and above the typical DNA damage as part of normal aging. , AD‐vulnerable brain regions belong to the most metabolically active regions of the brain. , This high energy demand may make these regions more susceptible to oxidative stress damage as compared to other parts of the brain. Of interest, (long) neuronal genes are known to selectively map to common fragile sites of genomic instability, further increasing their vulnerability to DNA damage. sSNVs tend to inactivate genes, leading to reduced expression and function of their encoded proteins. , In this respect, it is interesting that, when comparing the hippocampus of old versus young cognitively normal individuals (approximately 80 years vs approximately 20 years old), an overrepresentation of reduced gene expression was reported for longer genes. In turn, this could possibly disrupt the cellular pathways in which these proteins are involved, which may be relevant to AD pathogenesis.

Long genes are more frequently affected in AD

We built a theoretical model that predicts the likelihood of a gene being affected by at least one sSNV based on its length and the age of an individual. Our model estimated that an average‐sized gene has a 1% chance of acquiring at least one mutation by age 65 (ie, the age threshold for a LOAD diagnosis). In contrast, the likelihood for the longest gene in the genome, CNTNAP2, to be affected by an sSNV by age 65 is markedly higher, at 60%. We tested our model using publicly available post‐mortem brain sSNV data from three studies. This confirmed our predictions: The 272 longest genes in the genome (ie, genes with a log size of more than two standard deviations above the mean) were overrepresented among the genes affected by sSNVs in all three data sets, and the length of genes with sSNVs was longer than average in all three studies. As indicated above, sSNVs are likely to lead to reduced expression (and function) of the affected genes. Therefore, we tested if, compared to healthy individuals, this reduced expression of longer genes can be observed more frequently in (the brains of) AD patients. We confirmed that long genes were indeed much more likely than shorter genes to show reduced expression in AD brains. This abnormal expression pattern was found in six brain regions commonly affected in AD (temporal cortex, superior temporal gyrus, parahippocampal gyrus, inferior frontal gyrus, dorsolateral prefrontal cortex, and especially hippocampus). In contrast, in two other brain regions that are more resilient to AD (frontal pole and cerebellum), longer genes were not more likely to have reduced expression. Furthermore, to show that the observed reduced gene expression is not due to neuronal loss resulting from AD itself, we analyzed data of differential gene expression between AD patients and controls in individual neurons from the entorhinal cortex, a brain area in the vicinity of the hippocampus that is (also) among the first to be affected in AD. We found that longer genes are more likely to show reduced expression in individual inhibitory neurons from this brain region, but this effect was not seen in excitatory neurons.

Long genes encode proteins involved in synaptic pathways

Long genes are more likely to have brain‐related functions and to be expressed in the brain. To further explore the implications of long gene susceptibility to sSNVs, we performed pathway analysis to investigate which biological processes and molecular networks were enriched in the set of 272 longest genes in the genome. This indicated that long genes are involved in multiple synaptic functions such as synaptic organization, adhesion, transmission, and plasticity. We found that several of these synaptic pathways were also enriched within the differential gene expression data from the eight AD‐related brain regions. Furthermore, based on the direction of gene expression, the “synaptogenesis signaling” pathway was predicted to be inhibited in three brain regions (temporal cortex, parahippocampal gyrus, and hippocampus), whereas four additional synaptic function‐related pathways were inhibited in the hippocampus. Moreover, our finding of reduced expression of longer genes in inhibitory but not excitatory neurons of the entorhinal cortex (see above) fits very well with a recent paper that identified a diminished synaptic inhibitory‐excitatory balance in mouse entorhinal cortex as an early AD marker, preceding Aβ plaque formation.

Synaptic impairment in AD

Converging evidence from the literature suggests that synaptic impairment may be critical in AD development and progression. For instance, many of the AD GWAS genes encode proteins with essential roles in synaptic function and adhesion. The relevance of impaired synaptic processes in AD is further corroborated by the fact that synaptic loss is the strongest neuropathological correlate of cognitive decline in AD. , , Of interest, genes that have been associated through GWASs with AD‐related brain phenotypes—for example, hippocampal volume and cerebrospinal fluid levels of phosphorylated tau—are also enriched for processes such as synaptic plasticity. , In addition, the familial AD genes APP and PSEN1/2 have roles in synaptogenesis, synaptic adhesion, and neurotransmission. Moreover, LRP1B is encoded by one of the genome's longest genes and is highly expressed in AD‐vulnerable brain regions. , This protein serves as a receptor of apoE—with APOE ε4 being the strongest common genetic risk factor for LOAD (see above)—and it interacts with both APP and the postsynaptic scaffolding protein PSD95, , implying a role in regulating synaptic function. In this way, the very long gene LRP1B may link early and late‐onset AD through the effect of its encoded protein on synaptic signaling. Finally, several very long genes that we found to be both affected by sSNVs and downregulated in the hippocampus of AD patients encode proteins with important roles in synaptic function. For example, CNTNAP2, the longest human gene (see above), encodes a neuronal adhesion molecule and has been further implicated in AD etiology through the latest meta‐analytic GWAS. Reduced CNTNAP2 expression levels have also been observed in the AD hippocampus. Other examples of proteins encoded by very long genes that are both affected by sSNVs and downregulated in the AD hippocampus include PTPRT, a regulator of synaptogenesis, and RIMS2, a modulator of neurotransmitter release. In addition, the SLC4A10 and RYR2 proteins are involved in synaptic plasticity.

Limitations

Our study has two main limitations. First, the sSNVs that we analyzed to test our model were derived from two studies that examined bulk hippocampal tissue and not single neurons. Therefore, it is possible that some of these sSNVs occurred in brain cells other than neurons or as the result of developmental mosaicism, and follow‐up studies are needed that specifically investigate sSNVs at the level of single neurons in AD (see below). Second, because post‐mortem tissue is necessarily collected late in the disease course, the observed reduced expression of long genes in AD‐vulnerable brain regions could be a consequence of synaptic loss resulting from the AD pathology itself — due to neuronal atrophy — rather than being the underlying cause of it. However, in support of our findings, we also observed that the expression of long genes is reduced in single inhibitory neurons of AD patients.

Conclusions and directions for future research

We found that, through aging, longer genes are more frequently affected by sSNVs in the hippocampus, the most affected brain region in AD. In addition, long genes show reduced expression in multiple brain regions of AD patients, including the hippocampus. Furthermore, we showed that many of the longest genes in the genome code for proteins that are involved in synaptic adhesion and function. Based on expression data, these synaptic pathways were also predicted to be inhibited in the AD hippocampus and other AD‐vulnerable brain regions that are important for memory and cognition. Taken together, our findings provide novel insights in how aging‐induced DNA damage may promote AD pathogenesis through having a negative effect on synaptic function. As for future research, we propose three main avenues to pursue. First, studies are needed that conduct concurrent RNA and DNA sequencing (eg, as in ) of single neurons and possibly other cell types from brain tissue samples of AD patients and non‐demented controls. In this way, the putative causal relationship between specific sSNVs in (long) genes and their reduced expression could be corroborated. Second, studies should be conducted that are aimed at further unraveling the links between sSNVs in specific genes, synaptic impairment, and AD pathology. With regard to the latter, studies in animal or cellular models that manipulate the functions of specific long genes can be instrumental in elucidating the molecular chain of events following sSNVs. Third, animal models could be used to investigate earlier disease stages, and this to confirm that the observed reduction in (synaptic) gene expression and function is driving the AD pathology rather than being the result of end stage disease.

CONSOLIDATED RESULTS AND STUDY DESIGN

Model for sSNV likelihood through aging: effect of gene length

The human genome contains 20,535 unique protein‐coding genes that vary greatly in size (data retrieved from Ensembl Biomart [GENCODE v19, GRCH37p13] ). The distribution of human gene length has a long tail encompassing extremely long genes in the mega base pair (bp) range (Figure 1A). After log‐transformation, 272 genes have a log bp size of more than two standard deviations above the mean and are designated as the set of “very long” genes (Figure 1B and Supplementary Table 1).
FIGURE 1

Longer genes have an increased likelihood to be affected by sSNVs. (A) The length distribution of human genes has a long tail that extends toward a group of extremely long genes of 1‐2 mega base pairs (272 very long genes are indicated with open circles (see below) and gene length in base pairs (bp). Gene length information was retrieved from Ensembl Biomart (GENCODE v19, GRCh37p13). (B) Gene length follows a log‐normal distribution with parameters μ = 4.35 (22.5 kb) and σ = 0.68 (dashed line). The outlier bin near 1 kb represents the large family of olfactory receptors that have gone through extreme evolutionary expansion. The 272 genes that are indicated by the open circles in 1B and in the shaded gray area under the curve in 1C show the subgroup of very long genes (genes with gene length > μ+2σ) that were used for the enrichment analyses in this study. (C) Binomial probability model for gene conservation over time in which somatic mutations (sSNVs) take place at a fixed and uniform rate across the genome, age in years (y). An average‐sized gene mostly survives the mutational burden of aging, with only ≈1% of its copies being affected by somatic mutations in a 65‐year‐old subject. For longer genes, however, ≈60% of copies are expected to have been affected by at least one sSNV between the sixth and seventh decades of life. (D) sSNVs occur more often in longer genes (Kolmogorov‐Smirnov test: P < 1.0 × 10−4). Gene length distributions for genes having potential pathogenic sSNVs from the studies by Park et al. (Red, 208 genes), Ivashko‐Pachima et al. (Blue, 499 genes), Lodato et al. (Green, 175 genes), and all human protein‐coding genes (Black, 20535 genes) are shown. Circles following the same color code plotted below density graph represent individual gene lengths. Box plots visualize the median with flanking lower and upper hinges (corresponding to the 25th and 75th percentiles), and the whiskers represent the 95% confidence interval

Longer genes have an increased likelihood to be affected by sSNVs. (A) The length distribution of human genes has a long tail that extends toward a group of extremely long genes of 1‐2 mega base pairs (272 very long genes are indicated with open circles (see below) and gene length in base pairs (bp). Gene length information was retrieved from Ensembl Biomart (GENCODE v19, GRCh37p13). (B) Gene length follows a log‐normal distribution with parameters μ = 4.35 (22.5 kb) and σ = 0.68 (dashed line). The outlier bin near 1 kb represents the large family of olfactory receptors that have gone through extreme evolutionary expansion. The 272 genes that are indicated by the open circles in 1B and in the shaded gray area under the curve in 1C show the subgroup of very long genes (genes with gene length > μ+2σ) that were used for the enrichment analyses in this study. (C) Binomial probability model for gene conservation over time in which somatic mutations (sSNVs) take place at a fixed and uniform rate across the genome, age in years (y). An average‐sized gene mostly survives the mutational burden of aging, with only ≈1% of its copies being affected by somatic mutations in a 65‐year‐old subject. For longer genes, however, ≈60% of copies are expected to have been affected by at least one sSNV between the sixth and seventh decades of life. (D) sSNVs occur more often in longer genes (Kolmogorov‐Smirnov test: P < 1.0 × 10−4). Gene length distributions for genes having potential pathogenic sSNVs from the studies by Park et al. (Red, 208 genes), Ivashko‐Pachima et al. (Blue, 499 genes), Lodato et al. (Green, 175 genes), and all human protein‐coding genes (Black, 20535 genes) are shown. Circles following the same color code plotted below density graph represent individual gene lengths. Box plots visualize the median with flanking lower and upper hinges (corresponding to the 25th and 75th percentiles), and the whiskers represent the 95% confidence interval Assuming that the ≈40 sSNVs that accumulate yearly in neurons , occur at random positions in their genomes (6.4 billion base pairs), the probability per year, per nucleotide of acquiring an sSNV (ω) is 6.2 × 10−9. Hence, we modeled the chance for an sSNV occurring in a gene with the binomial equation , in which a is subject age in years, ω is the per‐nucleotide probability of an sSNV per year, and is the number of DNA nucleotides forming the transcribed region of the gene of interest (ie, the gene length). For an average‐sized gene (22.5 kbp, based on log distribution, see below), our model estimates that this mutational rate would result in ≈0.9% of gene copies acquiring at least one sSNV by the age of 65 (Figure 1C). The model further predicts that longer genes are more likely to be affected by sSNVs. For instance, 65 years of the same mutational rate is expected to affect 60.5% of the copies of the longest human gene, CNTNAP2 (Figure 1C). We then tested whether sSNVs are more likely to affect longer genes by comparing the lengths of genes affected by sSNVs in the hippocampus to the gene length distribution of all protein‐coding genes that we retrieved from Ensembl Biomart (see above), using Kolmogorov‐Smirnoff tests. We retrieved human hippocampal sSNV data from three recent publications: Lodato et al., Park et al., and Ivashko‐Pachima et al. (see “Detailed Methods and Results” and Table 4). As predicted by our model, the length of sSNV‐harboring genes is longer than average (P = 1.0 × 10−4) (Figure 1D). Using hypergeometric tests, we found that the set of 272 very long genes is enriched in the single cell sSNVs from the Lodato et al. study (4.3‐fold increase, P = 9.6 × 10−5), and in the hippocampal sSNVs from the studies by Park et al. and Ivashko‐Pachima et al. (3.3‐fold increase, P = 1.41 × 10−3, and 2.4‐fold increase, P = 6.90 × 10−4, respectively).
TABLE 4

Data resource information for data used in this article

Type of data Brain regionDetailsOriginal paper
sSNV Dentate gyrus/prefrontal cortexNIH NeuroBioBank; WGS of single isolated neuronal nucleiLodato et al. 14
sSNV HippocampusNetherlands Brain Bank and Human Brain and Spinal Fluid Resource Center; WES of laser capture micro dissected hippocampal formationsPark et al. 19
sSNV HippocampusBanner Sun Health Research Institute; RNA‐seq based mutation analysis (from GSE67333)Ivashko‐Pachima et al. 46
RNAseq CerebellumMayo clinic (AMP‐AD); bulk tissueAllen et al. 50
RNAseq Temporal cortexMayo clinic (AMP‐AD); bulk tissueAllen et al. 50
RNAseq Frontal pole (BA10)Mount Sinai/JJ Peters VA Medical Center Brain Ban (AMP‐AD); bulk tissueWang et al. 51
RNAseq Superior temporal gyrus (BA22)Mount Sinai/JJ Peters VA Medical Center Brain Ban (AMP‐AD); bulk tissueWang et al. 51
RNAseq Parahippocampal gyrus (BA36)Mount Sinai/JJ Peters VA Medical Center Brain Ban (AMP‐AD); bulk tissueWang et al. 51
RNAseq Inferior frontal gyrus (BA44)Mount Sinai/JJ Peters VA Medical Center Brain Ban (AMP‐AD); bulk tissueWang et al. 51
RNAseq Dorsolateral prefrontal cortexMount Sinai/JJ Peters VA Medical Center Brain Ban (AMP‐AD); bulk tissueMostafavi et al. 52
RNAseq HippocampusNetherlands Brain Bank; bulk tissueVan Rooij et al. 53
RNAseq Entorhinal CortexVictorian Brain bank; single‐nucleus RNA sequencingGrubman et al. 54

NOTE. Abbreviations: AMP‐AD, Accelerating Medicines Partnership Alzheimer's Disease Project; BA, Brodmann area; RNAseq, RNA sequencing; sSNV, somatic single nucleotide variant; WES, whole exome sequencing; WGS, whole genome sequencing.

Transcriptomic data analyses

We extracted differentially expressed genes, that is, genes that show significantly increased or decreased expression when comparing AD patients to non‐demented controls, from previously published RNA sequencing data resources for nine brain regions (cerebellum, temporal cortex, frontal pole, superior temporal gyrus, parahippocampal gyrus, inferior frontal gyrus, dorsolateral prefrontal cortex [DLPFC], hippocampus, and [single cell data from the] entorhinal cortex) (see “Detailed Methods and Results”). The transcriptomic data were filtered for protein‐coding genes and binned in 50 consecutive groups based on transcribed gene length. We compared the number of genes showing AD‐associated decreased expression in each bin with that of the total gene pool using hypergeometric tests. These analyses showed a sharp increase in downregulated genes at the far end of the gene length distribution (top 2%) in the temporal cortex (P = 1.2 × 10−8), superior temporal gyrus (Brodmann Area [BA]22) (P = 1.6 × 10−6), parahippocampal gyrus (BA36) (P = 1.3 × 10−11), inferior frontal gyrus (BA44) (P = 3.1 × 10−14), DLPFC (P = 5.0 × 10−5), and hippocampus (P = 1.1 × 10−16) (Figure 2). This effect was not observed in the cerebellum and the frontal pole (BA10), regions that are known to be more resilient to AD and can be considered negative controls (Figure 2). The analysis of single neuron transcriptome data from the entorhinal cortex revealed that genes in the top 2% bin of gene length showed reduced expression in inhibitory neurons of (P = 8.2 × 10−9) but not in excitatory neurons (Figure 3). Additional hypergeometric tests showed that the set of 272 very long genes was overrepresented within the significantly differentially expressed genes in the temporal cortex (P = 1.22 × 10−5), BA22 (P = 1.74 × 10−2), BA36 (P = 2.64 × 10−8), and hippocampus (P = 6.47 × 10−8). In contrast, we observed fewer differentially expressed very long genes in the cerebellum and DLPFC (P = 2.03 × 10−5 and P = 2.01 × 10−2) (Table 1). Eighteen of the 19 long genes affected by sSNVs from the Park et al. and Ivashko‐Pachima et al. studies for which RNA transcripts were detected showed decreased expression in the AD hippocampus (Table 2).
FIGURE 2

Long genes are significantly downregulated in AD‐relevant brain regions. Plots show differentially expressed genes, that is, genes that show significantly increased or decreased expression when comparing AD patients to non‐demented controls, from previously published RNA sequencing studies (Table 4). Protein‐coding genes are binned in 50 consecutive groups (gray bars), based on transcribed gene length. We compared the number of genes showing either increased or decreased expression in each bin (height of gray bar) with that of the total gene pool using hypergeometric tests (red circles, Bonferroni threshold for significance is indicated with dashed blue line)

FIGURE 3

Long genes are significantly downregulated in inhibitory neurons of the entorhinal cortex. Plots show differentially expressed genes, that is, genes that show significantly increased or decreased expression when comparing AD patients to non‐demented controls, in single inhibitory or excitatory neurons from the entorhinal cortex (Table 4). Protein‐coding genes are binned in 50 consecutive groups (gray bars), based on transcribed gene length. We compared the number of genes showing either increased or decreased expression in each bin (height of gray bar) with that of the total gene pool using hypergeometric tests (red circles, Bonferroni threshold for significance is indicated with dashed blue line)

TABLE 1

Over‐ and underrepresentation of very long genes in genes differentially expressed in the brain of AD patients

Brain regionNumber of genes detected (very long)Number of differentially expressed genes (very long)Over‐/underrepresentation
Cerebellum14291 (258)5128 (63)‐1.47 (P = 2.03 × 10‐5)
Temporal cortex14292 (258)6129 (143)1.29 (P = 1.22 × 10‐5)
Frontal pole (BA10)13788 (263)334 (5)‐ 1.27 (P = 1.51 × 10‐1)
Superior temporal gyrus (BA22)13789 (263)688 (20)1.52 (P = 1.74 × 10‐2)
Parahippocampal gyrus (BA36)13789 (263)4814 (134)1.46 (P = 2.64 × 10‐8)
Inferior frontal gyrus (BA44)13789 (263)151 (3)1.04 (P = 2.27 × 10‐1)
Dorsolateral prefrontal cortex13512 (250)1647 (22)‐1.39 (P = 2.10 × 10‐2)
Hippocampus14533 (250)7411 (156)1.22 (P = 6.47 × 10‐5)

NOTE. A hypergeometric test was performed to generate the P‐ values of over‐and underrepresentation.

Abbreviation: AD, Alzheimer's disease.

TABLE 2

Genes that are affected by sSNVs in the hippocampus , and differentially expressed in the AD hippocampus

sSNV studyDecreased mRNA expression in the AD hippocampusIncreased mRNA expression in the AD hippocampus
Park et al. 19 CAMTA1, CNTNAP2, CSMD2, NRXN1, PTPRT NAV2
Ivashko‐Pachima et al. 46 ANK2, DCC, FAT3, GRIK2, HS6ST3, KALRN, MYT1L, NELL1, RIMS2, RYR2, SLC4A10, TENM2, TENM3

Abbreviations: AD, Alzheimer's disease; sSNV, somatic single nucleotide variant.

Long genes are significantly downregulated in AD‐relevant brain regions. Plots show differentially expressed genes, that is, genes that show significantly increased or decreased expression when comparing AD patients to non‐demented controls, from previously published RNA sequencing studies (Table 4). Protein‐coding genes are binned in 50 consecutive groups (gray bars), based on transcribed gene length. We compared the number of genes showing either increased or decreased expression in each bin (height of gray bar) with that of the total gene pool using hypergeometric tests (red circles, Bonferroni threshold for significance is indicated with dashed blue line) Long genes are significantly downregulated in inhibitory neurons of the entorhinal cortex. Plots show differentially expressed genes, that is, genes that show significantly increased or decreased expression when comparing AD patients to non‐demented controls, in single inhibitory or excitatory neurons from the entorhinal cortex (Table 4). Protein‐coding genes are binned in 50 consecutive groups (gray bars), based on transcribed gene length. We compared the number of genes showing either increased or decreased expression in each bin (height of gray bar) with that of the total gene pool using hypergeometric tests (red circles, Bonferroni threshold for significance is indicated with dashed blue line) Over‐ and underrepresentation of very long genes in genes differentially expressed in the brain of AD patients NOTE. A hypergeometric test was performed to generate the P‐ values of over‐and underrepresentation. Abbreviation: AD, Alzheimer's disease. Genes that are affected by sSNVs in the hippocampus , and differentially expressed in the AD hippocampus Abbreviations: AD, Alzheimer's disease; sSNV, somatic single nucleotide variant.

Enrichment analyses

We used Ingenuity Pathway Analysis (IPA) to test for enrichment of canonical pathways within our predefined set of very long genes, using a false discovery rate (FDR) correction for multiple testing. Five of the 10 most enriched pathways in the very long gene set are directly related to synaptic function, that is, “synaptogenesis signaling” (P = 3.47 × 10−6), “synaptic long‐term depression”(P = 1.02 × 10−5), “CREB signaling in neurons” (P = 2.51 × 10−4) (which are the three most significantly enriched pathways), “synaptic long‐term potentiation” (P = 1.05 × 10−3), and “glutamate receptor signaling” (P = 2.57 × 10−3). With the Panther classification system we assessed enrichment of gene ontology (GO) terms within the same set of genes, with Fisher exact test and applying FDR correction. In keeping with the IPA results, the GO term analysis revealed that the set of very long genes is enriched for multiple synaptic functions such as synaptic organization, adhesion, transmission, and plasticity. The full results of the enrichment analyses of the 272 longest genes are provided in the supplement (Supplementary Table 2). We performed IPA canonical pathway enrichment analyses on the tissue level transcriptomic data (all differentially expressed protein‐coding genes with corrected P < 0.05) to predict whether the pathways that were enriched in the set of very long genes were activated or inhibited in AD brains. The “synaptogenesis signaling” pathway was predicted to be inhibited within three brain regions in which very long genes were overrepresented within the differentially expressed genes, that is, temporal cortex (P = 1.15 × 10−6; Z = ‐3.05), BA36 (P = 6.17 × 10−6; Z = ‐3.61), and hippocampus (P = 3.47 × 10−6; Z = ‐5.70). All four other synaptic function–related pathways that were enriched within the set of very long genes—that is, “synaptic long‐term depression,” “CREB signaling in neurons,” “synaptic long‐term potentiation,” and “glutamate receptor signaling”—were also predicted to be inhibited based on the differentially expressed genes in the hippocampus (P = 1.20 × 10−8; Z = ‐2.28, P = 5.01 × 10−14; Z = ‐2.91, P = 3.89 × 10−10; Z = ‐3.21, and P = 3.09 × 10−7; Z = ‐4.36) (Table 3 and Supplementary Table 3).
TABLE 3

Canonical pathway enrichment analysis for differentially expressed genes in brain regions of AD patients

Pathway Brain region
CerebellumTemporal cortexFrontal pole (BA10)Superior temporal gyrus (BA22)Parahippocampal gyrus (BA36)Inferior frontal gyrus (BA44)DLPFCHippocampus
Synaptogenesis Signaling Pathway

P = 3.98 × 10‐2

Z‐score = 0.816

P = 1.15 × 10‐6 Z‐score = ‐3.051

P = 4.90 × 10‐4

Z‐score = ‐1.706

P = 6.17 × 10‐6

Z‐score = ‐3.606

P = 7.95 × 10‐1

Z‐score = 0.426

P = 3.47 × 10‐6

Z‐score = ‐5.692

Synaptic Long‐Term Depression

P = 1.85 × 10‐1

Z‐score = ND

P = 1.00 × 10‐3

Z‐score = ‐0.707

P = 1.82 × 10‐5

Z‐score = ‐1.265

P = 7.35 × 10‐1

Z‐score = 0.258

P = 1.20 × 10‐8

Z‐score = ‐2.770

CREB Signaling in Neurons

P = 1.32 × 10‐1

Z‐score = ND

P = 2.19 × 10‐2

Z‐score = ‐1.633

P = 8.91 × 10‐4

Z‐score = ‐1.890

P = 6.08 × 10‐1

Z‐score = 0

P = 5.01 × 10‐14

Z‐score = ‐2.907

Synaptic Long‐Term Potentiation

P = 1.64 × 10‐1

Z‐score = ND

P = 1.95 × 10‐2

Z‐score = ‐1.342

P = 3.02 × 10‐3

Z‐score = ‐1.633

P = 4.86 × 10‐1

Z‐score = ‐0.277

P = 3.89 × 10‐10

Z‐score = ‐3.212

Glutamate Receptor Signaling

P = 2.84 × 10‐1

Z‐score = ND

P = 4.68 × 10‐2

Z‐score = ND

P = 9.12 × 10‐3

Z‐score = ND

P = 7.95 × 10‐1

Z‐score = ND

P = 3.09 × 10‐7

Z‐score = ‐4.359

NOTE. Results are shown for the five most significantly enriched synaptic pathways among the 272 longest genes in the genome (Supplementary Table 1). Significant P‐values (P < 0.05) and Z‐scores (Z ≤ ‐2 or Z ≥ 2) are indicated in bold.

Abbreviations: AD, Alzheimer's disease; BA, Brodmann area; DLPFC, dorsolateral prefrontal cortex; ND, Not determined.

Canonical pathway enrichment analysis for differentially expressed genes in brain regions of AD patients Z‐score = 0.816 Z‐score = ‐1.706 Z‐score = ‐3.606 P = 7.95 × 10‐1 Z‐score = 0.426 Z‐score = ‐5.692 P = 1.85 × 10‐1 Z‐score = ND Z‐score = ‐0.707 Z‐score = ‐1.265 P = 7.35 × 10‐1 Z‐score = 0.258 Z‐score = ‐2.770 P = 1.32 × 10‐1 Z‐score = ND Z‐score = ‐1.633 Z‐score = ‐1.890 P = 6.08 × 10‐1 Z‐score = 0 Z‐score = ‐2.907 P = 1.64 × 10‐1 Z‐score = ND Z‐score = ‐1.342 Z‐score = ‐1.633 P = 4.86 × 10‐1 Z‐score = ‐0.277 Z‐score = ‐3.212 P = 2.84 × 10‐1 Z‐score = ND Z‐score = ND Z‐score = ND P = 7.95 × 10‐1 Z‐score = ND Z‐score = ‐4.359 NOTE. Results are shown for the five most significantly enriched synaptic pathways among the 272 longest genes in the genome (Supplementary Table 1). Significant P‐values (P < 0.05) and Z‐scores (Z ≤ ‐2 or Z ≥ 2) are indicated in bold. Abbreviations: AD, Alzheimer's disease; BA, Brodmann area; DLPFC, dorsolateral prefrontal cortex; ND, Not determined.

DETAILED METHODS AND RESULTS

To test our model and hypothesis, we used publicly available resources to obtain data of sSNVs in the hippocampus and differential gene expression in AD brain regions (Table 4). Data resource information for data used in this article NOTE. Abbreviations: AMPAD, Accelerating Medicines Partnership Alzheimer's Disease Project; BA, Brodmann area; RNAseq, RNA sequencing; sSNV, somatic single nucleotide variant; WES, whole exome sequencing; WGS, whole genome sequencing.

sSNV data sets

Genes affected by exonic sSNVs in both people with early onset neurodegeneration due to genetic disorders of DNA repair and healthy controls were obtained from the whole‐genome sequencing study at single‐neuron level by Lodato et al. Genes affected by sSNVs in both AD patients and controls were retrieved from the studies at whole‐tissue level—more specifically, the hippocampus—by Park et al. and Ivashko‐Pachima et al.

RNA sequencing data sets

Furthermore, uniformly processed RNA sequencing (RNA‐seq) data (weighted linear model based on diagnosis) from seven brain areas (cerebellum, temporal cortex, frontal pole [Brodmann Area (BA) 10], two subregions of the temporal cortex [superior temporal gyrus (BA22) and parahippocampal gyrus (BA36)], inferior frontal gyrus [BA44], and dorsolateral prefrontal cortex [DLPFC]) was obtained from the AMPAD knowledge portal on the Synapse platform (syn2580853). , , On the AMPAD knowledge portal, the data, analysis results, analytical methodology, and research tools generated by multiple consortia are made available with support of the National Institute on Aging's Alzheimer's Disease Translational Research Program. To study the hippocampus, an additional RNA‐seq data set—that is, data of differential mRNA expression between AD patients and controls—from the Netherlands brain bank study was used. To show that the effects we observed are not due to changes in brain cell composition, we used a single‐cell RNA‐seq data set, that is, data of differential mRNA expression between AD patients and controls, in individual neurons from the entorhinal cortex. Grubman et al. used the single‐cell transcriptomic profiles of these entorhinal neurons to classify neurons into inhibitory and excitatory cells, using the RCA (Reference Component Analysis) method.

CONFLICT OF INTEREST

Geert Poelmans is director of DrugTarget ID, Ltd. (The Netherlands). All other authors declare no conflicts of interest. Supplementary information Click here for additional data file. Supplementary information Click here for additional data file. Supplementary information Click here for additional data file.
  53 in total

Review 1.  Energy metabolism and oxidative stress: impact on the metabolic syndrome and the aging process.

Authors:  Madlyn Frisard; Eric Ravussin
Journal:  Endocrine       Date:  2006-02       Impact factor: 3.633

Review 2.  Synaptic Impairment in Alzheimer's Disease: A Dysregulated Symphony.

Authors:  Stefania Forner; David Baglietto-Vargas; Alessandra C Martini; Laura Trujillo-Estrada; Frank M LaFerla
Journal:  Trends Neurosci       Date:  2017-05-08       Impact factor: 13.837

3.  Concurrent Single-Cell RNA and Targeted DNA Sequencing on an Automated Platform for Comeasurement of Genomic and Transcriptomic Signatures.

Authors:  Say Li Kong; Huipeng Li; Joyce A Tai; Elise T Courtois; Huay Mei Poh; Dawn Pingxi Lau; Yu Xuan Haw; Narayanan Gopalakrishna Iyer; Daniel Shao Weng Tan; Shyam Prabhakar; Dave Ruff; Axel M Hillmer
Journal:  Clin Chem       Date:  2018-12-06       Impact factor: 8.327

Review 4.  Common fragile sites, extremely large genes, neural development and cancer.

Authors:  David I Smith; Yu Zhu; Sarah McAvoy; Robert Kuhn
Journal:  Cancer Lett       Date:  2005-10-10       Impact factor: 8.679

Review 5.  Synaptic Dysfunction in Alzheimer's Disease: Aβ, Tau, and Epigenetic Alterations.

Authors:  Ke Li; Qing Wei; Fang-Fang Liu; Fan Hu; Ao-Ji Xie; Ling-Qiang Zhu; Dan Liu
Journal:  Mol Neurobiol       Date:  2017-04-29       Impact factor: 5.590

Review 6.  Protein tyrosine phosphatase PTPRT as a regulator of synaptic formation and neuronal development.

Authors:  Jae-Ran Lee
Journal:  BMB Rep       Date:  2015-05       Impact factor: 4.778

7.  Disruption of Slc4a10 augments neuronal excitability and modulates synaptic short-term plasticity.

Authors:  Anne Sinning; Lutz Liebmann; Christian A Hübner
Journal:  Front Cell Neurosci       Date:  2015-06-16       Impact factor: 5.505

8.  Human whole genome genotype and transcriptome data for Alzheimer's and other neurodegenerative diseases.

Authors:  Mariet Allen; Minerva M Carrasquillo; Cory Funk; Benjamin D Heavner; Fanggeng Zou; Curtis S Younkin; Jeremy D Burgess; High-Seng Chai; Julia Crook; James A Eddy; Hongdong Li; Ben Logsdon; Mette A Peters; Kristen K Dang; Xue Wang; Daniel Serie; Chen Wang; Thuy Nguyen; Sarah Lincoln; Kimberly Malphrus; Gina Bisceglio; Ma Li; Todd E Golde; Lara M Mangravite; Yan Asmann; Nathan D Price; Ronald C Petersen; Neill R Graff-Radford; Dennis W Dickson; Steven G Younkin; Nilüfer Ertekin-Taner
Journal:  Sci Data       Date:  2016-10-11       Impact factor: 6.444

9.  An anatomically comprehensive atlas of the adult human brain transcriptome.

Authors:  Michael J Hawrylycz; Ed S Lein; Angela L Guillozet-Bongaarts; Elaine H Shen; Lydia Ng; Jeremy A Miller; Louie N van de Lagemaat; Kimberly A Smith; Amanda Ebbert; Zackery L Riley; Chris Abajian; Christian F Beckmann; Amy Bernard; Darren Bertagnolli; Andrew F Boe; Preston M Cartagena; M Mallar Chakravarty; Mike Chapin; Jimmy Chong; Rachel A Dalley; Barry David Daly; Chinh Dang; Suvro Datta; Nick Dee; Tim A Dolbeare; Vance Faber; David Feng; David R Fowler; Jeff Goldy; Benjamin W Gregor; Zeb Haradon; David R Haynor; John G Hohmann; Steve Horvath; Robert E Howard; Andreas Jeromin; Jayson M Jochim; Marty Kinnunen; Christopher Lau; Evan T Lazarz; Changkyu Lee; Tracy A Lemon; Ling Li; Yang Li; John A Morris; Caroline C Overly; Patrick D Parker; Sheana E Parry; Melissa Reding; Joshua J Royall; Jay Schulkin; Pedro Adolfo Sequeira; Clifford R Slaughterbeck; Simon C Smith; Andy J Sodt; Susan M Sunkin; Beryl E Swanson; Marquis P Vawter; Derric Williams; Paul Wohnoutka; H Ronald Zielke; Daniel H Geschwind; Patrick R Hof; Stephen M Smith; Christof Koch; Seth G N Grant; Allan R Jones
Journal:  Nature       Date:  2012-09-20       Impact factor: 49.962

10.  Discovery of autism/intellectual disability somatic mutations in Alzheimer's brains: mutated ADNP cytoskeletal impairments and repair as a case study.

Authors:  Yanina Ivashko-Pachima; Adva Hadar; Iris Grigg; Vlasta Korenková; Oxana Kapitansky; Gidon Karmon; Michael Gershovits; C Laura Sayas; R Frank Kooy; Johannes Attems; David Gurwitz; Illana Gozes
Journal:  Mol Psychiatry       Date:  2019-10-30       Impact factor: 15.992

View more
  3 in total

1.  Somatic genomic changes in single Alzheimer's disease neurons.

Authors:  Michael B Miller; August Yue Huang; Junho Kim; Zinan Zhou; Samantha L Kirkham; Eduardo A Maury; Jennifer S Ziegenfuss; Hannah C Reed; Jennifer E Neil; Lariza Rento; Steven C Ryu; Chanthia C Ma; Lovelace J Luquette; Heather M Ames; Derek H Oakley; Matthew P Frosch; Bradley T Hyman; Michael A Lodato; Eunjung Alice Lee; Christopher A Walsh
Journal:  Nature       Date:  2022-04-20       Impact factor: 69.504

2.  Long genes are more frequently affected by somatic mutations and show reduced expression in Alzheimer's disease: Implications for disease etiology.

Authors:  Sourena Soheili-Nezhad; Robert J van der Linden; Marcel Olde Rikkert; Emma Sprooten; Geert Poelmans
Journal:  Alzheimers Dement       Date:  2020-10-19       Impact factor: 21.566

Review 3.  Genotoxic Damage During Brain Development Presages Prototypical Neurodegenerative Disease.

Authors:  Glen E Kisby; Peter S Spencer
Journal:  Front Neurosci       Date:  2021-12-02       Impact factor: 4.677

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.