Literature DB >> 33788386

"Protein aggregates" contain RNA and DNA, entrapped by misfolded proteins but largely rescued by slowing translational elongation.

Robert J Shmookler Reis1,2,3, Ramani Atluri2, Meenakshisundaram Balasubramaniam2, Jay Johnson3, Akshatha Ganne3, Srinivas Ayyadevara1,2.   

Abstract

All neurodegenerative diseases feature aggregates, which usually contain disease-specific diagnostic proteins; non-protein constituents, however, have rarely been explored. Aggregates from SY5Y-APPSw neuroblastoma, a cell model of familial Alzheimer's disease, were crosslinked and sequences of linked peptides identified. We constructed a normalized "contactome" comprising 11 subnetworks, centered on 24 high-connectivity hubs. Remarkably, all 24 are nucleic acid-binding proteins. This led us to isolate and sequence RNA and DNA from Alzheimer's and control aggregates. RNA fragments were mapped to the human genome by RNA-seq and DNA by ChIP-seq. Nearly all aggregate RNA sequences mapped to specific genes, whereas DNA fragments were predominantly intergenic. These nucleic acid mappings are all significantly nonrandom, making an artifactual origin extremely unlikely. RNA (mostly cytoplasmic) exceeded DNA (chiefly nuclear) by twofold to fivefold. RNA fragments recovered from AD tissue were ~1.5-to 2.5-fold more abundant than those recovered from control tissue, similar to the increase in protein. Aggregate abundances of specific RNA sequences were strikingly differential between cultured SY5Y-APPSw glioblastoma cells expressing APOE3 vs. APOE4, consistent with APOE4 competition for E-box/CLEAR motifs. We identified many G-quadruplex and viral sequences within RNA and DNA of aggregates, suggesting that sequestration of viral genomes may have driven the evolution of disordered nucleic acid-binding proteins. After RNA-interference knockdown of the translational-procession factor EEF2 to suppress translation in SY5Y-APPSw cells, the RNA content of aggregates declined by >90%, while reducing protein content by only 30% and altering DNA content by ≤10%. This implies that cotranslational misfolding of nascent proteins may ensnare polysomes into aggregates, accounting for most of their RNA content.
© 2021 The Authors. Aging Cell published by Anatomical Society and John Wiley & Sons Ltd. Published 2021. This article is a U.S. Government work and is in the public domain in the USA.

Entities:  

Keywords:  Alzheimer’s disease; DNA; RNA; aggregation; apolipoprotein E; beta amyloid; cotranslational misfolding; endogenous viruses; functional annotation; gene ontology; neurodegeneration; nucleic acid sequence; nucleic acids; protein aggregates; proteomics; retrotransposons

Mesh:

Substances:

Year:  2021        PMID: 33788386      PMCID: PMC8135009          DOI: 10.1111/acel.13326

Source DB:  PubMed          Journal:  Aging Cell        ISSN: 1474-9718            Impact factor:   9.304


Alzheimer's disease age‐matched controls APOE, apolipoprotein E (protein, gene) Amyloid beta (residues 1–42) Chromatin ImmunoPrecipitation ‐ sequencing Coordinated Lysosomal Expression and Regulation, a binding motif deoxyribonucleic acid EEF2, eukaryotic elongation factor 2 (protein, gene) G‐quadruplex binding protein microtubule‐associated protein 1A nucleic acids not significant (referring to a difference between groups) overexpressing post‐translational modification ribonucleic acid transcription factor

INTRODUCTION

Protein aggregation increases inexorably with aging in all animal species and in all tissues examined (Ayyadevara, Balasubramaniam, Johnson, et al., 2016; Ayyadevara, Balasubramaniam, Suri, et al., 2016; Brignull et al., 2007; Cohen et al., 2009; David et al., 2010; Dillin & Cohen, 2011; Reis‐Rodrigues et al., 2012). Specific aggregate components, diagnostic for each human neurodegenerative disease, are thought to play causal roles because their pathology‐associated mutation and/or overexpression are sufficient to confer heritable neuropathy in human pedigrees and in transgenic‐animal models (Bandyopadhyay et al., 2007; Dillin & Cohen, 2011; Li et al., 2013; Miller et al., 2010; Roodveldt et al., 2009). Proteins that require structural flexibility often incorporate disordered regions, thus rendering them vulnerable to aggregation; other proteins are only susceptible to aggregation after oxidation or specific post‐translational modifications (Ayyadevara et al., 2015, 2017; Ayyadevara, Balasubramaniam, Parcon, et al., 2016). We recently developed improved click‐chemistry crosslinking reagents and analytical software to identify adjacent proteins in aggregates, based on peptide‐peptide crosslinking, and we applied it to define the protein‐adherence network, or “contactome”, of aggregates. We began with total, sarkosyl‐insoluble aggregates isolated from SY5Y‐APPSw human neuroblastoma cells (Ayyadevara et al., 2017), a model of familial Alzheimer's disease (fAD). This work revealed a complex, non‐random structure of aggregates in which megahubs (very‐high‐connectivity hubs with ≥100 partners) and hub connectors (low‐connectivity proteins linking large hubs) contribute functionally to the assembly of large aggregates (Balasubramaniam et al., 2019). We noted marked enrichment among megahubs for large structural proteins such as titin, ankyrins 1 – 3, nesprins 1 – 3, MAP1A, and other neurofilament proteins, purely as a consequence of their size. We also observed significant enrichment for a variety of nucleic acid‐binding proteins (Balasubramaniam et al., 2019).

RESULTS

The aggregate interactome

To compensate for protein size variation, we reassessed the aggregate interactome with normalization for protein length. The intra‐aggregate contactome, based on length‐normalized connectivity (interaction number per residue), fell into 11 clusters comprising 24 “central hubs” (Figure 1; hubs with >4 edges, indicated by red circles). Four “hub connectors” of low degree (≤4 edges; green circles) bring together large hubs not otherwise connected. Remarkably, all 24 central hubs and 2 of 4 hub connectors are nucleic acid‐binding proteins (Figure 1), revealing a striking enrichment for proteins that bind RNA (N = 16; p < 3E‒150), or bind DNA (N = 6; p < 2E‒20), or both (N = 2). This supports and extends our earlier observation that nucleic acid‐binding proteins are especially susceptible to aggregation (Balasubramaniam et al., 2019) and led us to inquire whether their targets, RNA and DNA, might also be present in the entities we call “protein aggregates”.
FIGURE 1

The “aggregate contactome” of proteins isolated from SY5Y‐APPSw human neuroblastoma cells, an in vitro model of familial AD. The contactome was generated from proteomic data for cross‐linked peptide pairs in sarkosyl‐insoluble aggregates, using a modified version of X‐link Identifier (Balasubramaniam et al., 2019; Du et al., 2011), requiring ≥10 spectral hits per protein observed in at least 2 of 3 replicate crosslinking experiments. Hits were normalized to hub length (amino acids in the most abundant isoform). Red circles highlight central hubs with 5 or more large‐hub interactors; green circles show smaller hub‐connectors, which join major hubs not otherwise connected. Other proteins of interest are indicated by dashed gray circles

The “aggregate contactome” of proteins isolated from SY5Y‐APPSw human neuroblastoma cells, an in vitro model of familial AD. The contactome was generated from proteomic data for cross‐linked peptide pairs in sarkosyl‐insoluble aggregates, using a modified version of X‐link Identifier (Balasubramaniam et al., 2019; Du et al., 2011), requiring ≥10 spectral hits per protein observed in at least 2 of 3 replicate crosslinking experiments. Hits were normalized to hub length (amino acids in the most abundant isoform). Red circles highlight central hubs with 5 or more large‐hub interactors; green circles show smaller hub‐connectors, which join major hubs not otherwise connected. Other proteins of interest are indicated by dashed gray circles

Quantitation of aggregate nucleic acids from AD vs. control hippocampus, or human glioma cells

We isolated total sarkosyl‐insoluble aggregates from hippocampal tissue of individuals diagnosed with Alzheimer's disease (AD) and confirmed by histopathological markers, and from age‐matched controls (AMC) without dementia or AD‐diagnostic markers (amyloid deposits or hyperphosphorylated tau). From equal initial weights of hippocampus, quantified recoveries of nucleic acids increased in AD aggregates, over those in controls, by 1.5‐ to 2‐fold for DNA, and ~twofold for RNA (Figure 2A,B). These elevations did not differ significantly from the difference in protein content of total aggregates, which was ~60% higher in AD than in controls, in close agreement with previous results (Ayyadevara, Balasubramaniam, Parcon, et al., 2016). Among normal controls, there was fourfold to sixfold more RNA than DNA in total sarkosyl‐insoluble aggregates (p < 1E–5), regardless of the methods used for separation and quantitation (see Experimental Procedures). For AD samples, nucleic acid recoveries were higher and more variable, with roughly twice as much RNA as DNA (Figure 2B).
FIGURE 2

Recovery of DNA and RNA from aggregates. DNA and RNA were extracted and quantified from insoluble aggregates, isolated from hippocampi of AMC (APOE ε3/ε3), or AD (ε3/ε3 or ε4/ε4) individuals (A, each N = 3); or from similar mixes of APOE ε3/ε3 and ε3/ε4 individuals (B, each N = 5–6). (C, D) DNA and RNA were extracted from insoluble aggregates from T98G glioma cells. C, independent cultures of T98G cells overexpressing APOE ε3 from a transgene were compared to cultures expressing ε4. D, T98G cells without any transgene were lysed in 0.5% NP40, and nuclei separated from cytoplasm. DNA was chiefly associated with nuclear aggregates, and RNA with cytoplasmic aggregates, as expected. Means ±SDs are shown. p values reflect 2‐tailed heteroscedastic t tests.

Recovery of DNA and RNA from aggregates. DNA and RNA were extracted and quantified from insoluble aggregates, isolated from hippocampi of AMC (APOE ε3/ε3), or AD (ε3/ε3 or ε4/ε4) individuals (A, each N = 3); or from similar mixes of APOE ε3/ε3 and ε3/ε4 individuals (B, each N = 5–6). (C, D) DNA and RNA were extracted from insoluble aggregates from T98G glioma cells. C, independent cultures of T98G cells overexpressing APOE ε3 from a transgene were compared to cultures expressing ε4. D, T98G cells without any transgene were lysed in 0.5% NP40, and nuclei separated from cytoplasm. DNA was chiefly associated with nuclear aggregates, and RNA with cytoplasmic aggregates, as expected. Means ±SDs are shown. p values reflect 2‐tailed heteroscedastic t tests. Apolipoprotein E (APOE) gene alleles are the leading genetic risk factors for AD, with at least fourfold increased AD risk for each APOE ε4 allele (abbreviated APOE4, ε4, or E4), and increased severity of aggregate‐associated neuropathology for AD carriers of APOE4 alleles (Neu et al., 2017; Parker et al., 2005). We recently reported that the concerted transcription of autophagy genes is disrupted in the human glioblastoma cell line T98G, when overexpressing an APOE4 transgene rather than APOE3 (Parcon et al., 2018). To assess whether the greater nucleic acid content of aggregates in AD vs. AMC hippocampi may reflect the disruption of autophagy in AD, we separately analyzed aggregates from T98G cells that overexpress APOE ε3 or ε4 from transgenes. The DNA content of T98G/E4 aggregates was about twice that of T98G/E3 aggregates (Figure 2C; p < 0.001), whereas their RNA content declined a little (<15%, N. S.) with the APOE4 allele overexpressed. Most neuropathic aggregates are cytoplasmic, but may also be nuclear or extracellular. When we separated nuclei from cytoplasm of T98G cells prior to aggregate isolation, similar amounts of aggregate protein were recovered from each fraction. However, nuclear aggregates contained mostly DNA and only 40% as much RNA, while cytoplasmic aggregates contained ~10‐fold more RNA than DNA (Figure 2D).

Sequencing data for aggregate nucleic acids

To assess whether DNA and RNA fragments in aggregates are a random sampling from the genome and transcriptome, respectively, these nucleic acids were separately extracted from pooled aggregate preparations from either AD or age‐matched control (AMC) individuals (APOE ε3/ε4 heterozygotes; 3 subjects per group), and their sequences determined (UT Southwestern Genomics Core, Dallas TX). DNA fragments were analyzed using a ChIP‐seq protocol, suited to detection of site specificity, and were then mapped to the human genome. RNA fragments underwent a dual screening, comprising a test of peak significance (similar to ChIP‐seq) followed by RNA‐seq analysis of differential abundance, and mapping to the human genome. Of the 38 loci that showed significant DNA read peaks in at least one pool (each p < 10‒5 that the reads actually came from a uniform distribution), 25 reached that threshold in AD vs. 28 in AMC tissue, and 26 in T98G cells carrying APOE3 vs. 30 in APOE4 cells (Table 1, Supplementary Table S1). Each group contains 17‒20 peak loci with p < 10‒8, and one locus with p < 10‒50. Considering that a random representation of DNA reads would display a flat (uniform) distribution, these data provide decisive evidence that aggregate DNA fragments are not random, but most likely reflect specific binding sites for proteins that ensnared them into aggregates. Only 5 of these 38 DNA peaks (13%) mapped to known genes or ORFs: RP5‐857K21.4, LINC00486, DUX4L26, MAMDC2‐AS1, and ROCK1P1. The other 33 peaks mapped to intergenic regions. Of the 24 chromosomes represented, Y had the most DNA reads (47‒98 reads in each of 6 peaks), followed by chromosomes 4 (3 peaks of 48‒53 reads), 1 (2 peaks of 57‒65 reads), 10 (2 peaks of 35‒45 reads), 16 (2 peaks of 26‒35 reads), 17 (2 peaks of 24‒37 reads), and 21 (4 peaks of 9‒17 reads). Neither the numbers of DNA peaks or reads differed significantly between the groups compared (AD vs. AMC, or E3 vs. E4), with the sole exception of a Y‐chromosome peak mapping at 26,638,004‒ 26,638,595, which yielded only 5 reads in aggregates from AD, but 59 for AMC (Chi2 p < 0.005) and a range of 52‒63 reads for the 3 non‐AD samples.
TABLE 1

DNA Fragments Recovered from AD, AMC Aggregates (ChIP‐seq analysis)

Chrom.RegionLength5’ Gene5’ distance3’ Gene3’ DistanceAD PeaksAMC PeaksAD Peak p‐valueAMC Peak p‐valueGene Description (Nearest Gene)
1 633735..634300566RP5‐857 K21.336236 RP5‐857 K21.4 0144. E−024. E−06Transcribed processed pseudogene; also in all aggregate RNAs
1 143202361..143202903543RP11‐344P13.121621836CH17‐333 M13.214027656513. E−299. E−31
2 32916116..32916684569 LINC00486 0AL121656.510442224. E−112. E−13Long Intergenic Non‐Protein‐Coding RNA 486; also in all aggregate RNAs
2 89830806..898313785735S_rRNA230125IGKV2D−402041211112. E−059. E−06
3 75668953..75669543591 DUX4L26 0LINC009602847114. E−114. E−11Double Homeobox 4‐Like 26
3 93470298..93470861564HSPE1P193208589RNU6‐488P372902216. E−571. E−63
4 49149228..49149839612TPI1P4132073AC118282.34736334279. E−111. E−09
4 49711347..49711912566AC119751.4111837DCUN1D4213108713189. E−167. E−18
4 190179314..190179950637AC215524.14089ABC7‐42391500H16.35896332. E−133. E−13
5 49661354..49661926573CTD−2013 M15.13765383EMB73426515132. E−111. E−10
6 61322762..61323316555RP1‐91 N13.181450MTRNR2L9251011113. E−023. E−06
7 59995570..59996144575RP11‐548 K12.132159318RP11‐715L17.1227921609no peak2. E−03
8 43237602..43238166565VN1R46P16792RP11‐726G23.28588444. E−084. E−09
9 70038064..70038625562RN7SL570P153409 MAMDC2‐AS1 0118. E−071. E−05MAM Domain Containing 2 (antisense strand)
10 42070370..42070986617RP11‐96F8.13287261KSR1P17832339359. E−062. E−04
10 133688058..133688661604AL845259.22945DUX4L2951944222. E−053. E−04
12 35614697..35615271575AK6P11364738RP11‐125 N22.41926740118. E−035. E−05
13 18211736..18212297562AL356585.24266RP11‐341D18.441228112. E−116. E−25
16 34063731..34064293563RP11‐598D12.212369CTD−2522B17.856176557. E−052. E−04
16 34587942..34588505564BCLAF1P2319292CTD−2144E22.935347115168. E−167. E−18
16 46394505..46395067563PPP1R1AP21E+07ANKRD26P17427316123. E−234. E−23
17 21972876..21973349474KCNJ18268263AC144838.283237978. E−062. E−06
17 26603641..26604283643RP11‐846F4.113897955RP11‐260A9.137741017173. E−121. E−09
18 107991..108643653RP11‐683L23.613660 ROCK1P1 421121. E−079. E−10
18 110241..110848608 ROCK1P1 0MIR80781407229. E−092. E−08Rho‐Assoc'd Coiled‐Coil Prot. Kinase 1 ψgene1
20 31241472..31242039568AC104301.2339650DEFB1151562417133. E−185. E−16
21 7926020..7926634615SNORA1199794CH507‐338C24.3174616339. E−094. E−09
21 8806909..8807490582SNX18P105028bP−2189O9.138075112. E−041. E−05
21 10269991..10270557567CH507‐216 K13.1132986AP003900.657853145. E−038. E−06
21 10692593..10693162570IGHV1OR21‐142757AP001464.42306513424. E−156. E−15
22 18896073..18896656584AC008103.411390DGCR69371112. E−024. E−05
X 156030320..156030900581DDX11L162442111. E−061. E−06
Y 10746657..10747217561RNA5‐8SP6546350RP1‐85D24.42955041685. E−051. E−03
Y 11305019..11305652634AC134878.1120061DUX4L161265641. E−091. E−10
Y 11312236..11312823588AC134882.13884DUX4L172097451. E−088. E−09
Y 11324443..11324950508DUX4L181619DUX4L197378883. E−054. E−05
Y 11721743..11722304562RP11‐295P22.2247505RCC2P158747881. E−092. E−08
Y 26638004..26638595592FAM58CP10844CTBP2P13E+075598. E−033. E−13

All listed peaks differed significantly from a uniform distribution at p < 0.05 to p < 6E–57; peak‐coincident genes (at zero distance from peaks) are indicated by bold font. Peak counts were not corrected for the 1.6‐fold higher protein and DNA recovery from AD relative to AMC hippocampus, since all reflect normalized data from 1 µg DNA.

DNA Fragments Recovered from AD, AMC Aggregates (ChIP‐seq analysis) All listed peaks differed significantly from a uniform distribution at p < 0.05 to p < 6E–57; peak‐coincident genes (at zero distance from peaks) are indicated by bold font. Peak counts were not corrected for the 1.6‐fold higher protein and DNA recovery from AD relative to AMC hippocampus, since all reflect normalized data from 1 µg DNA. RNA differed from DNA in several important respects. Most notably, 81% of RNA peaks mapped to known or putative genes, vs. only 13% of DNA peaks. (Table 2, Supplementary Tables S2, S3; note that intergenic RNA peaks were omitted to conserve space.) The 49 “within‐peak” genes include 39 (80%) that differed significantly in read count between AD and AMC at p < 0.001 (2‐tailed Fisher exact tests), vs. 1 of 38 (2.6%) for DNA. It is noteworthy that 30 significantly differential genes were more abundant in AD, while only 9 (23%) were relatively enriched in AMC. This 3:1 bias is on top of the ~1.8‐fold higher abundance of RNA in AD aggregates (Figure 2A), since all counts were normalized to the source library. Among T98G glioblastoma genes (Supplementary Table S3), 54 of 59 RNA peaks (92%) differ between APOE3 and APOE4 at p < 0.0001, in marked contrast to DNA peaks of which none were significantly differential.
TABLE 2

RNA Fragments Recovered from AD, AMC Aggregates (RNA‐seq analysis)

Chrom.RegionLength5’ gene5’ dist.3’ gene3’ dist.AD readsAMC readsAD/AMC readsChi2 p<Transcript or encoded protein
1 633888..634164277RP5‐857 K21.336389 RP5‐857 K21.4 051513600.380.0001Transcribed processed pseudogene
1 2281959..2282363405 SKI 0MORN138889139619670.710.0001Ski proto‐oncogene
1 109100211..109100485275 SCARNA2 0RP5‐1065 J22.43049131981.340.05scaRNA, small Cajal‐body specific RNA2, guides mod'n of snRNAs
1 244863775..244864066292RP11‐11 N7.58277 HNRNPU 05096530.780.004Heterog. Nuclear Ribonucleoprot. U (transripts 1, 3, 4, 8)
2 32916221..32916538318 LINC00486 0AL121656.5105881753.400.02Long Intergenic Non‐Protein Coding RNA 486
2 47335292..47335571280CALM2158690 AC073283.4 0272820931.300.0002Uncharacterized transcript
2 148961908..148962183276 KIF5C 0LYPD6B75923305518561.650.0001Kinesin Family Member 5C (transcript KIF5C_2)
2 181678574..181678849276ITGA4142386 CERKL 0107601.780.001Ceramide Kinase Like
2 231460607..231460888282AC017104.27453 NCL 041271.52N. S.Ceroid‐Lipofuscinosis Neuronal Protein 5 (neurodegenerative lysosome storage disease)
2 233275721..233275994274 ATG16L1 0 SCARNA5 0400/11206/51.94/2.20.0001Autophagy Related 16‐Like 1 / scaRNA, small Cajal‐body RNA5
2 240713854..240714163310RP11‐118 M12.212168 KIF1A 0280649880.560.0001Kinesin family member 1A
3 160514943..160515216274RNU7‐136P42526 KPNA4 05362322.310.0001Karyopherin (Importin) Subunit Alpha 4
6 26204696..26204971276 HIST1H4E 0HIST1H2BG110941643040.540.0001Histone 1 (AD‐agg‐enriched)
6 34071883..34072162280MIR127571831 GRM4 01068955211.940.0001Glutamate Metabotropic Receptor 4
6 52995648..52995925278 RN7SK 0ICK535332569121382.680.00017SK RNA
6 99408034..99408309276COQ313829 PNISR 02061351.530.001PNN Interact. Ser & Arg‐Rich Prot. (Pinin, desmosome assoc. prot.)
7 5529123..5529444322MIR58933205 ACTB 0121017190.700.0001beta actin (AD‐agg‐enriched)
7 26200464..26200741278CTB−119C2.125518 HNRNPA2B1 0168991.700.0002HnRNP A2/B1 (AD‐agg‐enriched)
8 9903986..9904242257MIR597162217 LINC00599 0166711131.500.0001Long Intergenic Non‐Protein Coding RNA
8 27604977..27605252276GULOP15903 CLU 01561051.490.005Clusterin (a.k.a. ApoJ) (AD‐agg‐enriched)
9 35657748..35658022275SIT16797 RMRP_1 04721842.570.0001RNA Component of Mitoch. RNA Processing Endoribonuclease
10 101799051..101799385335NPM315637 MGEA5 / OGA 06834321.580.0001Hyaluronidase =OGA, removes O‐GlcNAc modifications
11 35663281..35663611331 TRIM44 0KRT18P1419663711475921.940.0001Tripartite Motif Containing 44
11 62841562..62841839278RP11‐727F15.97518 WDR74 03041492.040.0001WD Repeat Domain 74
11 65502716..65503064349AP000769.74310 MALAT1_1 030539164941.850.0001Metastasis Assoc. Lung Adenocarcinoma Transcr. 1 (Non‐Protein)
11 111911504..111911780277RPL37AP822029 CRYAB 02282161.06N. S.Crystalin A beta, a small HSP (AD‐agg‐enriched)
11 123061064..123061339276RPL31P479662 HSPA8 087115120.580.0001Heat Shock Protein Family A (Hsp70) Member 8 (= HSC70)
12 110281619..110281909291 ATP2A2 0RN7SL769P785424332251.920.0001 SERCA2 ATPase Sarcoplasmic/Endopl. Reticulum Ca2+ Transport2
13 45975357..45975649293AL445232.160203 ZC3H13 03722401.550.0001Zinc Finger CCCH‐Type Containing 13
14 20343094..20343370277SNORD12616567 RPPH1 023136343.650.0001Ribonuclease P RNA Component H1
14 23321579..23321860282 PABPN1 0SLC22A17244453412611.310.01Poly(A) Binding Protein Nuclear 1
14 49586595..49586871277RNA5SP38433846 RPS29 02280049584.600.0001Ribosomal protein, small subunit 29
14 49853648..49853923276RNU6‐539P13820 Metazoa_SRP_138 02401075543.180.0001RNA, cytoplasmic Signal Recognition Particle
14 102084750..102085033284RN7SL472P7277 HSP90AA1 0186941.980.0001HSP 90 family (AD‐agg‐enriched)
17 6452910..6453209300FAM64A1440 PITPNM3 06715201.290.004PITPNM 3, Phosphatidylinositol Transfer Protein Memb.‐ Assoc.
17 19187969..19188246278KYNUP118300 SNORD3A 0342993.450.0001small nucleolar RNA
17 44911190..44911459270AC015936.36643 GFAP 0705189890.780.0002Glial Fibrillary Acidic Protein, GFAP (AD‐agg‐enriched)
18 49814124..49814400277ACAA2163 RP11‐886H22.1 02951392.120.0001Transcribed processed pseudogene
19 13298568..13298866299CTC−250I14.1139779 CACNA1A 01026859841.720.0001Calcium Voltage‐Gated Channel Subunit Alpha1 A
19 36304598..36304871274 CTD−3162L10.1 0LINC0066581955561643.390.0001Uncharacterized locus
19 44907737..44908015279 APOE 0CTB−129P6.71359102810031.02N. S.ApoE, Apolipoprotein E (AD‐agg‐enriched)
19 48645763..48646036274DBP8324 CA11 04075730.710.0002Carbonic Anhydrase 11
19 49107777..49108115339 SNRNP70 0LIN7B6208209613981.500.0001Small Nuclear Ribonucleoprotein U1 Subunit 70
20 23637606..23637888283CST931729 CST3 05737420.770.003Cystatin C, Cystatin 3
20 41533110..41533383274RP4‐620E11.5150002 CHD6 012458641.440.0001Chromodomain Helicase DNA Binding Protein 6, CHDBP6
21 8258828..8259116289RNA5‐8S51894 FP671120.6 012001329.090.0001Uncharacterized transcript
21 8435771..8435981211 FP236383.5 0FP236383.1962071217.250.0001Uncharacterized transcript
21 26021821..26022101281LLPHP2258487 APP 0 14777022.100.0001Amyloid precursor protein, APP, A4 (AD‐agg‐enriched)
X 140784311..140784609299LINC0063211631 CDR1 036675629510.580.0001Cerebellar Degeneration Related Protein 1, CDRP1

All peaks shown were significant at p < 0.01; peak‐coincident genes (at zero distance from peaks) are indicated by bold font. Read ratios (AD/AMC) and p values for AD‐AMC differences were not corrected for 1.6‐fold higher RNA recovery from AD relative to AMC hippocampus, since all normalized RNA‐seq libraries used 1 µg RNA. “AD‐agg enriched” indicates proteins that were also found to be significantly enriched in AD aggregates relative to controls (Ayyadevara, Balasubramaniam, Parcon, et al., 2016).

RNA Fragments Recovered from AD, AMC Aggregates (RNA‐seq analysis) All peaks shown were significant at p < 0.01; peak‐coincident genes (at zero distance from peaks) are indicated by bold font. Read ratios (AD/AMC) and p values for AD‐AMC differences were not corrected for 1.6‐fold higher RNA recovery from AD relative to AMC hippocampus, since all normalized RNA‐seq libraries used 1 µg RNA. “AD‐agg enriched” indicates proteins that were also found to be significantly enriched in AD aggregates relative to controls (Ayyadevara, Balasubramaniam, Parcon, et al., 2016). Aggregate RNA reads that were substantially more abundant in AD than controls (Chi‐square or 2‐tailed Fisher exact p < 0.001) include two uncharacterized transcripts on chromosome 21, enriched 17‐fold and 9‐fold beyond other RNAs in AD; ribosomal protein/RPS29, 4.6‐fold; RNAse‐P/RPPH1_2, 3.7‐fold; nucleolar RNA/SNORD3A and long noncoding RNA/LINC00486, enriched 3.5‐ and 3.4‐fold; signal‐recognition‐particle RNAs (SRP_138 and RN7SK), 3.2‐ and 2.7‐fold; mitochondrial RNAse P1/RMRP1, 2.6‐fold; karyopherin/KPNA4, 2.3‐fold; amyloid precursor protein/APP, 2.1‐fold, and SERCA2/ATP2A2, 1.9‐fold (Table 2). It is noteworthy that 2 of the 5 genes identified in aggregate DNA, LINC00486 and RP5‐857K21.4, were also among the AD‐enriched transcripts in aggregates, and 9 of the 39 genes (23%) enriched in AD aggregate RNA, relative to AMC, encode proteins that were also enriched in AD‐specific aggregates (Ayyadevara, Balasubramaniam, Parcon, et al., 2016) (bold font in the rightmost column of Table 2). Because we had observed roughly twice as much DNA in aggregates isolated from glioblastoma cells overexpressing APOE4, as in identical cells expressing APOE3, we asked whether any particular loci or genes were differentially represented in their aggregates. DNA read counts from E3 and E4 aggregates were in fact quite similar for all DNA loci sequenced (Supplementary Table S1), whereas RNA sequencing data (listed in Supplementary Table S3) show striking increases in aggregate‐entrapped RNA transcripts isolated from APOE3‐overexpressing (OE) cells, relative to isogenic cells overexpressing APOE4. For 53 of the 59 genes that were confidently identified within fragment alignment peaks, the read count in APOE4‐OE cells differed significantly from APOE3‐OE cells at Fisher exact p < 10‒4, with E3/E4 ratios ranging from 1.7‒13.9. Only one gene appeared to be more abundant in the presence of excess APOE4, a long noncoding RNA for which there were too few reads to attain significance. This bias is consistent with evidence that APOE4 protein competes with transcription factor TFEB for the ~400 DNA binding sites containing the CLEAR motif, most of which drive expression of proteins involved in autophagy/lysosome functions (Sardiello, 2016). Differential RNA‐fragment abundances in glioblastoma aggregates, in which only the APOE allele differs between cell lines, tend to be highly significant (53 of 59 have p < 0.0001) and comprise an interesting set. Examples include α‐enolase (E3/E4 = 1.85), MHC‐II YBX1 (3.8), scaRNA2 (2.3), histones H2B (2.8) and H1 (6.7), HSP60 (1.6), HSP90‐A1 (1.8) and ‐B1 (1.9), HSP‐A8 (2.0) and HSP‐B1 (3.4), IGF‐BP3 (5.4) and ‐BP5 (2.0), NCL (2.0), prothymosin α (3.2), SPARC (2.4), nucleophosmin (2.5), RACK1 (3.0), 7SK small nuclear RNA (6.2), EEF1‐A1 (2.2), β‐actin (3.2), peptidylprolyl isomerase A (2.6), collagen 1A2 (4.1), vimentin (4.2), CD44 (2.7), cofilin 1 (3.6), GAPDH (2.9), α tubulin (2.9), RNAse P component H1 (9.2), ribosomal proteins RPS2 (4.2) and RPS29 (13.9), 7SL RNA 2 (8.7), β2‐microglobulin (4.1), annexin A2 (3.4), pyruvate kinase M (2.4), profilin 1 (3.7), γ‐actin (3.4), APOE (3.6), ferritin light chain 1 (3.0), galectin 1 (5.0), and filamin A (2.1). With lesser significance, we find synapsin 3 (E3/E4 = 3.5; p < 0.0002), sequestosome_1/SQSTM1 (1.3; p < 0.004) and vimentin antisense (15; p < 0.002). In both the direction and magnitude of the RNA‐abundance shift, the influence of Alzheimer's disease was less consistent and so appeared less pronounced on average, than that of the APOE allele. This is almost certainly due to genetic and environmental variance among AD and AMC subjects (Ayyadevara, Balasubramaniam, Parcon, et al., 2016), in contrast to the single transgene that distinguishes T98G/E3 from T98G/E4 cells. Because all human subjects considered in the present comparison were APOE3/E4 heterozygotes, the AD effect could not have arisen from a difference in APOE genotypes. The prevailing reduction in RNA content of E4 aggregates, for the most differentially expressed genes, may reflect transcriptional suppression of TFEB targets by APOE4 (Parcon et al., 2018), rather than an impact of the APOE allele on aggregation per se. Mapping the RNA transcripts to the human genome revealed a remarkable cluster of at least 20 intergenic loci in a relatively silent segment (21p11.2‒21p12) of the chromosome 21 short arm (Supplementary Figure S1). While these loci are not differentially represented for the most part, either between AD and AMC or between APOE3 and APOE4, they include 2 loci with the highest AD/AMC ratios we observed, 9.1 and 17.2 (each Chi2 p < 10‒6, Table 2).

Annotation enrichment meta‐analysis of RNA fragments in aggregates

Although we had expected the RNA fragments embedded in aggregates to comprise a random selection from the transcriptome, gene ontology and pathway term enrichment analysis (functional‐annotation clustering in DAVIDTM, http://david.ncifcrf.gov) revealed highly significant enrichment for specific groups of RNAs. Focusing on genes with RNA reads that map to significant peaks and are differentially abundant in T98G/E3 vs. T98G/E4 glioblastoma cells, DAVID meta‐analysis revealed highly significant enrichment clusters for gene annotations relating to [extracellular exosome + acetylation + phosphoprotein + nucleus, acetylation + poly(A) binding], [Ubl conjugation + cadherin binding + cell‐cell adherens junction ], [glycoprotein binding + protein stabilization], and [myelin sheath + unfolded protein response + protein refolding + stress response + chaperone] (Table 3A).
TABLE 3

DAVID Meta‐Analysis of Top RNA‐seq Peaks from Aggregates

A. Genes from E3, E4 reads (55 DAVID IDs, implicating 16 clusters of terms sharing members)
Cluster #, GO TermCluster Enrich.CountFold Enrich.Benjamini
1, Extracellular exosome10.45375.59E−21
Acetylation10.45314.23E−12
Phosphoprotein10.45352.08E−6
Nucleus10.45272.14E−4
2, Acetylation8.65314.23E−12
Poly(A) RNA binding8.65217.11E−10
Ubiquitinlike (Ubl) conjugation8.65205.53E−8
3, Ubiquitinlike (Ubl) conjugation6.79205.53E−8
Cadherin binding, cell‐cell adhesion6.799126E−5
Cell‐cell adherens junction6.798122E−5
4, Glycoprotein binding3.585306E−4
Protein stabilization3.586174E−3
5, Myelin sheath2.778222E−6
Response to unfolded protein2.775452E−3
Protein refolding2.7741002E−3
Stress response2.775231E−3
Chaperone2.776141E−3

N.B.: Minor terms were omitted from each cluster. Cluster enrichment is the “Enrichment Score” from Functional Annotation Clustering under DAVID; fold enrichment is “Fold Change” per term; Benjamini indicates the false discovery rate, FDR, predicted by the Benjamini‐Hochberg procedure.

DAVID Meta‐Analysis of Top RNA‐seq Peaks from Aggregates N.B.: Minor terms were omitted from each cluster. Cluster enrichment is the “Enrichment Score” from Functional Annotation Clustering under DAVID; fold enrichment is “Fold Change” per term; Benjamini indicates the false discovery rate, FDR, predicted by the Benjamini‐Hochberg procedure. Among genes with well‐mapped reads that differ significantly between aggregates from AD vs. AMC, the most enriched clusters include [intracellular ribonucleoprotein complex + methylation + Ubl conjugation + poly(A) RNA binding + acetylation ], [extracellular matrix + chaperone + ATPase activity], [nucleoplasm + nucleus ], and [myelin sheath + unfolded protein binding + chaperone ] (Table 3B). The very existence of these clusters, and their marked overlap between meta‐analyses derived from gene lists of very different origin (aggregates from cultured glioblastoma cells vs. human hippocampi) despite only 11 common members, suggests that the underlying aggregate‐RNA fragments are strikingly nonrandom in nature. The specific annotation terms that were most enriched (Table 3C) are likely to reflect the nature of proteins that coalesce in AD and AD‐model aggregates, which include terms (fold enrichment) such as protein refolding (70), MHC class II protein complex binding (62), oxidation (56), amyloidosis (43), response to unfolded protein (31), glycoprotein binding (27), intracellular ribonucleoprotein complex (18), unfolded protein binding (16), and neurodegeneration (9).

What mechanisms account for RNA and DNA fragments co‐aggregating with proteins?

What does the inclusion of DNA and RNA fragments imply about aggregates or the mechanism of aggregation? Clearly, there are proteins in aggregates that evolved to bind both nucleic acids and proteins. RNA assumes many transient structures constrained chiefly by its duplex regions, which form A‐helices. DNA, in addition to its repertoire of relatively stable structures (A‐, B‐ and Z‐duplex helices, triplex, and G‐quadruplex forms), in the course of replication and transcription can adopt as wide a range of single‐stranded structures as RNA. Affinity for nucleic acids, as well as the protein constituents of multimeric RNA‐ and DNA‐binding complexes, may require protein structures that are at least partially disordered (Zhang et al., 2013) and/or are highly polar, which in turn may favor aggregation (Babu, 2016; Kovacech et al., 2010). DNA‐binding proteins include histones; high‐mobility‐group (HMG) proteins; constituents of DNA replication, transcription, and repair complexes (e.g. topoisomerases, helicases and polymerases; transcription factors, co‐factors, and repressors); and proteins that stabilize or remodel chromatin (Figure 1) (Li et al., 2006; Mitchell & Tjian, 1989; Stoyanova et al., 2009; Wade, 2001). A key feature shared by many DNA‐binding proteins, in addition to structural instability, is an excess of positively charged residues—allowing formation of electrostatic bonds to the negatively charged phosphates that link DNA‐backbone sugars. RNA‐binding proteins include splicing factors, translational initiation and elongation factors, ribosomal and associated proteins (e.g., refolding chaperones), signal recognition particles, and proteins involved in the processing and functions of noncoding RNAs (Figure 1) (Castello et al., 2012; Glisovic et al., 2008; Hentze et al., 2018; Turner & Hodson, 2012). There are also diverse proteins that bind both RNA and DNA—including RNA polymerases and other transcription‐complex components, RNA/DNA helicases, and TAR DNA‐binding protein (TDP43/TADBP) (Gao et al., 2019; Hudson & Ortlund, 2014; Kobren & Singh, 2019; Nikpour & Salavati, 2019; Norman et al., 2016; Shi & Berg, 1995; Zacco et al., 2019). Such proteins may account for the presence in aggregates of both RNA and DNA fragments mapping to LINC00486 and RP5‐857K21.4 loci. G‐quadruplex binding proteins (G4BPs) could be responsible for the presence of certain DNA and RNA segments in aggregates. Of the 39 DNA loci listed in Table 1, 13 (33%) had predicted G‐quadruplex‐forming sequences at >100‐fold higher likelihood than predicted at random, whereas 18 (46%) were <20‐fold above random expectation (Supplementary Table S4 and Figure S2A; note that numbers differ slightly due to binning). This partition into G4‐rich and G4‐poor regions suggests that a subset of DNA fragments may have been “recruited” into aggregates by G4BPs. A similar but less extreme split was observed for RNA fragments listed in this table: 15 of 51 peaks (30%) had ratios >100, vs. 11 (22%) with ratios <20 (Table S2 and Figure S2B).

Viral RNA and DNA fragments are enriched in AD aggregates relative to AMC

RNA and DNA fragments recovered from aggregates include sequences that do not map to the consensus human genome, but are related to known viral sequences compiled in the VirTect Database (Khan et al., 2019; Xia et al., 2019). After removal of all sequence reads homologous to the human genome, the remainder were mapped to the VirTect virus‐sequence library. Raw viral RNA reads comprise 0.09% of total AMC RNA‐fragment sequences (124,803/132,252,624), and 0.15% of AD RNA‐fragments (236,939/158,235,930). Viral DNA reads comprise 0.33% of total AMC or AD DNA‐fragment sequences (124,805/37,225,119 for AMC, 146,271/44,743,221 for AD). In view of their scarcity, such fragments are unlikely to drive aggregation; moreover, only a small minority of total raw reads met all three of the stringent VirTect thresholds (coverage depth ≥5x, a continuous/contiguous region ≥100 nt, and a read count ≥400) required for positive identification of human viruses. A total of 7 human viruses met all criteria, for a total of >135,000 reads (Table 4), out of >800 viruses or viral fragments detected (271,000 DNA reads, 362,000 RNA reads). As a negative control, the C. elegans genome was screened with identical parameters, yielding zero viral reads.
TABLE 4

Human viral sequences in RNA and DNA fragments recovered from insoluble hippocampal aggregates

Viral Genome Sequence (viral classification group)AMC_RNAf ReadsAD_RNAf ReadsAMC_DNAf ReadsAD_DNAf ReadsAMC RNA/DNAAD RNA/DNARNA (AD/AMC)DNA (AD/AMC)
NC_022518.1_HERV_K113 (ssRNA‐RT)1,2892,2423,5864,6980.360.481.741.31
NC_001806.1_Human_herpesvirus_1 (dsDNA)

(162)

(327)1921830.841.792.020.95
NC_001798.1_Human_herpesvirus_2 (dsDNA)12,806***28,0521,403(1,589)9.1317.652.191.13
NC_007605.1_Human_herpesvirus_4 (dsDNA)(137)2342001780.691.311.710.89
NC_000898.1_Human_herpesvirus_6B (dsDNA)(521)***1,4762,4112,6270.220.562.831.09
NC_012959.1_Human_adenovirus_54 (dsDNA)273440(14)(10)19.5044.001.610.71
NC_009823.1_Hepatitis_C_virus_type_2 (+ssRNA)(5,492)***(15,501)19,181(22,028)0.290.702.821.15
gi|60955|lcl|HPV6REF.1|_Human_papillomavirus_6_(HPV6) (dsDNA)(82)(140)(151)2180.540.641.711.44
gi|9627396|lcl|HPV9REF.1|_Human_papillomavirus_9_(HPV9) (dsDNA)(121)***(405)94(112)1.293.623.351.19
gi|1491683|lcl|HPV72REF.1|_Human_papillomavirus_72_(HPV72) (dsDNA)334**(692)(22)(33)15.1820.972.071.50
TOTALS 21,217 ***49,509 27,254 31,676

Values in parentheses failed to meet one or more thresholds, but are included here for purposes of comparison. AD RNA counts differ from AMC by Chi‐squared test: **p<0.001; ***p<0.0001.

Human viral sequences in RNA and DNA fragments recovered from insoluble hippocampal aggregates (162) Values in parentheses failed to meet one or more thresholds, but are included here for purposes of comparison. AD RNA counts differ from AMC by Chi‐squared test: **p<0.001; ***p<0.0001. For 5 of the 10 viruses shown in Table 4, viral RNA fragments were significantly enriched in AD over AMC (at p < 0.01 to p < 0.0001), relative to the 1.6‐fold AD enrichment of aggregate proteins (Ayyadevara, Balasubramaniam, Parcon, et al., 2016), for a combined significance of p < 1E–18. Three of these viruses, all linear duplex DNA viruses, were substantially more abundant in RNA than in DNA: Herpesvirus 2 (RNA/DNA = 9.1 in AMC, 17.7 in AD); Human Adenovirus 54 (RNA/DNA = 19.5 in AMC, 44.0 in AD); and Human Papillomavirus 72 (RNA/DNA = 15.2 in AMC, 21.0 in AD). Most of the remaining viruses were more highly represented in DNA than RNA, signifying that they were transcriptionally inactive. The consistently greater RNA/DNA ratios in AD tissue than in AMC is intrinsically “corrected” for relative viral and aggregate abundance, and strongly implies greater transcription and/or aggregation of RNA in AD hippocampus relative to AMC.

Cotranslational aggregation

As noted above, RNA reads substantially exceeded DNA reads by twofold to fivefold (Figure 2A). The propensity for nucleic acid‐binding proteins to be inherently disordered, suggested above as an explanation for entrapment of nucleic acids in aggregates, is not expected to differ greatly between RNA‐ and DNA‐binding proteins. We propose another mechanism, specific to RNA, that would account for the greater abundance of RNA in aggregates: cotranslational misfolding. Among the RNAs identified in AD‐model aggregates (Table 2), many encode proteins that are themselves enriched in AD aggregates: for example, HnRNP_A2/B1, clusterin/ApoJ, β‐crystallin A (CRYAB), SERCA_2/ATP2A2, GFAP, APOE, and Amyloid Precursor Protein/APP (Ayyadevara, Balasubramaniam, Parcon, et al., 2016). Of the 49 genes with RNA positively identified in aggregates, 9 (18%) encoded proteins that were also identified in aggregates. Twenty‐three (46%) of the same 49 RNAs were significantly more abundant in AD aggregates than in controls, while 6 (12%) were significantly enriched in AD aggregates as both RNA and protein. During translation, nascent proteins are at high risk for misfolding and aggregation until entire structural domains have emerged from the ribosome. From bacteria to mammals, chaperone complexes that include members of the HSP40, HSP60, and HSP90 families are closely associated with ribosomes, where they counteract misfolding of nascent polypeptides (Deuerling et al., 2019; Zhang & Ignatova, 2011). Nevertheless, the fraction of newly synthesized proteins that is cotranslationally degraded can exceed 50% (Turner & Varshavsky, 2000), indicating that chaperone protection is highly fallible. We wondered whether the remarkable abundance in aggregates of diverse RNA fragments, the great majority of which contain coding sequences, might be a clue that cotranslational aggregation occurs when misfolded, nascent proteins are neither prevented from misfolding nor degraded, prior to their coalescence with other misfolded proteins to form insoluble aggregates. If this is the case, then interventions that arrest or delay translation should sharply reduce the aggregate content of RNA fragments. We used shRNA knockdown of EEF2 mRNA, reducing its steady‐state level by 33% (Figure 3A,B) to attenuate protein translation in SY5Y‐APPSw human neuroblastoma cells. Suppression of EEF2 has been shown to extend lifespan, reduce stress response, and improve the balance of protein quality control (Anisimova et al., 2018; David et al., 2010; Tavernarakis, 2008; Turner & Varshavsky, 2000). While prior research showed the existence of co‐translational protein misfolding and degradation (G. Zhang & Ignatova, 2011), our results suggest that slowing translation may reduce aggregation of misfolded proteins, both in C. elegans (data not shown) and in cultured human cells as follows. In SY5Y‐APPSw cells, shRNA targeting EEF2 eliminated over 90% of the RNA entrapped in aggregates (p < 0.0001; Figure 3C,D), far exceeding the 33% efficacy of EEF2 knockdown (Figure 3B). At the same time, this RNAi exposure had little or no effect on aggregate DNA content (Figure 3E,F), but reduced aggregate protein by 30% (p < 0.01; Figure 3G,H). In SY5Y‐APPSw cells treated for 4 h with MG132, a cell‐permeant proteasome inhibitor, aggregates increased 20–30%; however, this rise was not accompanied by any increase in aggregate RNA fragments (Figure S3). This suggests that the reduction in aggregate burden per se cannot account for the decline in aggregate RNA after EEF2 knockdown.
FIGURE 3

Effects of EEF2 knockdown on the composition of aggregates in SY5Y‐APPSw cells. Results shown in each panel comprise data from 3 independent cell expansions treated with shRNA constructs targeting EEF2, or 2 scrambled RNAs for controls. Replicate experiments produced similar results. (A, B), Western‐blot quantitation of EEF2 knockdown efficacy, evaluated by EEF2 protein; efficacies of individual shRNAs (constructs a, b, c in Methods) are superimposed in B. (C, D), total RNA fragments in aggregates, quantified by gel staining with SYBR Gold. (E, F) Total DNA fragments in aggregates, quantified by ethidium bromide fluorescence. (G, H) Total aggregate protein, quantified by staining with SYPRO Ruby. p values shown here are based on 2‐tailed heteroscedastic t tests, for 3 – 4 experiments, combining data from shRNAs a – c

Effects of EEF2 knockdown on the composition of aggregates in SY5Y‐APPSw cells. Results shown in each panel comprise data from 3 independent cell expansions treated with shRNA constructs targeting EEF2, or 2 scrambled RNAs for controls. Replicate experiments produced similar results. (A, B), Western‐blot quantitation of EEF2 knockdown efficacy, evaluated by EEF2 protein; efficacies of individual shRNAs (constructs a, b, c in Methods) are superimposed in B. (C, D), total RNA fragments in aggregates, quantified by gel staining with SYBR Gold. (E, F) Total DNA fragments in aggregates, quantified by ethidium bromide fluorescence. (G, H) Total aggregate protein, quantified by staining with SYPRO Ruby. p values shown here are based on 2‐tailed heteroscedastic t tests, for 3 – 4 experiments, combining data from shRNAs a – c

DISCUSSION

Pathognomonic complexes associated with neurodegenerative diseases, including Alzheimer's, Parkinson's, and Huntington's diseases, are widely termed “protein aggregates” because their diagnostic antigenic markers are proteins. Whether these aggregates also contain other components, however, is a question that has not been adequately addressed. We were aware that some amalgamations of cell debris that accumulate with aging, known as lipofuscin granules, contain a complex mixture of oxidized, glycated and carbonylated proteins, lipids, and possibly other carbohydrates; however, nucleic acids were only rarely noted among their constituents (Cindrova‐Davies et al., 2018; Nowotny et al., 2014). Ginsberg et al. (1998, 1999) reported that 80% of neurofibrillary tangles and 55% of senile (amyloid) plaques can be stained with acridine orange, implying the presence of RNA. Numerous studies have implicated nucleic acid binding by mammalian prion‐like protein, PrP (Cordeiro et al., 2014; Gomes et al., 2012; Macedo et al., 2012; Silva et al., 2008), and the evidence that this extends to other neurodegenerative‐disease seed proteins has been reviewed (Cordeiro et al., 2014). We were led from the results of proteomic “contactome” studies, intended to define the molecular architecture of aggregates (i.e., which proteins adhere to which other proteins), to investigate the nature, extent, and specificity of nucleic acids incorporated into aggregates. In each of these three respects, the results were unexpected. We observed two‐ to fivefold more RNA than DNA in aggregates, whether isolated from AD or control hippocampus (Figure 2A). Many RNA sequences identified in human hippocampal aggregates were differentially abundant in AD‐ vs. control‐derived aggregates; of these, twice as many were enriched significantly in AD aggregates as in non‐AD controls. Proteomic analyses of aggregates from equal‐weight aliquots of AD vs. AMC hippocampus samples indicate an AD/AMC ratio of 1.84 (t test p < 0.01), in reasonable agreement with previous AD/AMC protein ratios of 1.65 and 1.66 for Aβ and tau aggregates, respectively (Ayyadevara, Balasubramaniam, Parcon, et al., 2016), and do not differ significantly from the ratios observed here for RNA and DNA. Interactome complexities of Aβ and tau aggregates (unpublished data) indicate AD/AMC ratios of 1.84 and 1.64, respectively (each t test p < 0.01)—implying more abundant and varied protein interfaces in AD than in AMC hippocampus. When we compared aggregates isolated from glioblastoma cells overexpressing an APOE3 vs. APOE4 transgene, sequences with the most differential representation were quite consistently more abundant in APOE3‐bearing cells. We believe this very likely reflects the surprising ability of APOE4 to enter nuclei and bind competitively to the CLEAR/E‐box motifs recognized by transcription factor EB (TFEB), thereby inhibiting expression of autophagy and lysosomal genes (Parcon et al., 2018). Not surprisingly, >90% of aggregate DNA originated from nuclear aggregates, while RNA in aggregates was predominantly of cytoplasmic origin. Only a small fraction of aggregate‐associated nucleic acids (0.09 – 0.15% of RNA reads, 0.33% of a smaller set of DNA reads) appears to be of viral origin, although these totals may be underestimated due to as‐yet‐uncatalogued and mutated viruses or endogenous retroposons (Sanjuan et al., 2010). The striking 2.3‐fold enrichment of viral RNA sequences in AD aggregates relative to controls, vs. only 1.15‐fold for viral DNA fragments (see Table 4), is consistent with possible roles of viral infection and/or transcriptional activation in the etiology of Alzheimer's disease (Balin & Hudson, 2018; Irish et al., 2009; Kreutz, 2002; Kristensson, 1992; Linet al., 1997; Romeo et al., 2019; Steel & Eslick, 2015). It is also possible that the observed enrichments reflect secondary effects of Alzheimer's pathology, including chronic low‐grade inflammation (Majde, 2010), insofar as they may augment viral infection or transcriptional activation in the AD brain. Previous studies have shown that soluble amyloid‐like proteins bind to nucleic acids, which could lead to formation of amyloid fibrils (Di Domizio et al., 2012). Nucleic acid‐containing amyloid fibrils induce interferon and activate innate immune Toll‐like receptors, driving neuroinflammation and synapse loss in AD (Di Domizio et al., 2012; Roy et al., 2020). Somatically integrated and even endogenous (heritable) viral genomes have highly variable insertion sites. As a result, viral RNAs and DNAs require identification by searching a database of known virus genomes. Quantitation of viral sequences may thus be underestimated due to the many human viruses as yet unidentified, plus the high viral mutation rates impeding sequence alignment. Nevertheless, viral RNA and DNA comprise very small fractions of the nucleic acids recovered from aggregates. From an evolutionary perspective, however, they may ultimately be responsible for the perseverance in our genomes of proteins with high levels of disorder and high probability of aggregation—provided only that disorder contributes to the ability to bind viral nucleic acids and/or to sequester them in aggregates. The observed data are consistent with a scenario in which endogenous retroviruses—of which HERV K113 is the youngest and only actively transposing exemplar (Boller et al., 2008)—and integrated genomic copies of retroviruses (e.g., Hepatitis C viruses) and DNA viruses (e.g., Herpes viruses) become activated and transcribed into RNA. Darwinian selection might favor protein variants that are predisposed to misfold, provided that they disable replication and transcription of viral genomes within cells by entrapping them in aggregates. Variants that enhanced survival of a pandemic by even a few percent would undergo strong selective pressure to sweep the population, becoming the predominant or sole alleles (Karlsson et al., 2014). Predicted G‐quadruplex‐forming sequences in both DNA and RNA, the best known and most abundant class of four‐stranded nucleic acid structures, are also markedly enriched in AD aggregates. Sequences with G‐quadruplex‐forming potential can be recognized by their binding proteins based on singular structural features; they thus often serve as recognition sites for critical proteins with key surveillance or regulatory functions, such as telomere‐binding proteins, viral‐replication proteins, and gene promotor regions (Brazda et al., 2014). The observation of consistent functional‐annotation terms and clusters, both within each aggregate type and between the two sources of aggregates, confirms that the particular RNA species found in aggregates are not a random sampling from the transcriptome—but it does not explain the basis for their enrichment. We propose two routes by which nucleic acids can be incorporated into aggregates that form either as a result of aging per se or due to an age‐dependent pathology such as Alzheimer's disease: (1) “hitchhiker” or “bystander” entrapment of DNA and RNA, when they are bound by proteins that become misfolded and consequently enmeshed in aggregates; and (2) cotranslational misfolding of proteins in the midst of their translation, which might be expected to also ensnare ribosomes and the mRNAs they are translating. The first mechanism is supported by the remarkably high abundance of DNA‐ and RNA‐binding proteins in the aggregate interactome (Figure 1). The second mechanism is most compellingly supported by the decimation (>10‐fold reduction) of aggregate RNA content following shRNA knockdown of the translational procession factor EEF2. We suspect that cotranslational aggregation occurs preferentially in pathways or processes that involve enzymes with multiple partners, and/or several nucleic acid‐binding proteins—thus accounting for the highly significant enrichments observed in aggregated RNA, for genes annotated with specific clusters of descriptive terms. Note that neither of these explanations attributes a primary or causal role to nucleic acids, through which they would “drive” aggregate accrual. Rather, they are collateral casualties due to misfolding of their attached proteins. Why did EEF2 knockdown have a far greater effect on RNA content than protein content of aggregates? This is actually the expected result if cotranslational aggregation accounts for only a minor fraction of the protein deposition in aggregates, but is responsible for 90% of their RNA content. Nascent proteins may misfold transiently during translation, but even mature proteins can misfold over time, as a consequence of post‐translational disturbances such as oxidation, phosphorylation, or alkylation, and other temperature‐ or time‐dependent processes that favor misfolding of pre‐existing proteins. Such processes would continue with little prospect of reversal, for all previously‐synthesized proteins—unabated by translational arrest. RNA, however, may only appear in aggregates when it is bound by a misfolded (and hence aggregation‐prone) protein, or when the RNA is in the process of translation into a nascent protein that has a high probability of transient misfolding and aggregation. Our observations imply that cotranslational aggregation is the predominant route, accounting for at least 90% of aggregate RNA. Our data suggest that proteostasis in SY5Y‐APPSw cells, which are subjected to chronic ER stress by continual generation of Aβ1–42, is normally insufficient to prevent cotranslational aggregation. However, even moderate alleviation of that stress appears to shift the balance back to sustainable translational proteostasis. Translational inhibition has been reported to lower chronic inflammation (Mazumder et al., 2010), which may be a consequence of reduced protein aggregation, augmented by a disproportionate decrease in aggregate RNA.

CONCLUSIONS

“Protein aggregates” contain nucleic acid constituents that are highly nonrandom in sequence—making it unlikely that they are artifacts, but instead implying that they contain protein‐binding features (including G‐quadruplexes) that might pull them into aggregates. The number and variety of viral sequences found in aggregates suggests that there may be an evolutionary advantage (i.e., antiviral protection) to the synthesis of nucleic acid binding proteins that readily misfold and thus sequester viral genomes in aggregates. Significant enrichment of viral sequences in AD aggregates, relative to controls, is consistent with roles for integrated viruses in AD susceptibility. The preferential enrichment of RNA over DNA in aggregates may implicate a mechanism specific to transcripts: cotranslational aggregation of polysomes during initial misfolding of nascent polypeptides. This process would likely be quite sensitive to the balance between translation rate and chaperone‐mediated refolding capacity. A critical role of cotranslational entrapment is supported by our observation that shRNA knockdown of the translation elongation factor EEF2, although only 25–35% effective, selectively eliminates at least 90% of RNA in aggregates.

EXPERIMENTAL PROCEDURES

Preparation of cells

Cells were grown in 75‐cm2 flasks, with culture medium comprising Dulbecco's Modified Eagles Medium (DMEM) supplemented with 10% (v/v) fetal bovine serum (FBS) at 37°C, grown in an atmosphere of air supplemented with 5% CO2. Cells were harvested and washed with phosphate buffered saline and then digested with 0.25% (w/v) trypsin (Thermo Fisher) at 37°C for 4 min or until cells detach from the surface.

EEF‐2 knockdown

For EEF‐2 gene knockdown, RNAi knockdown was performed with 3 distinct EEF‐2 shRNA sequences, targeting human EEF‐2 (SASI_Hs01_00212218 and SASI_Hs_0022218 from Millipore‐Sigma; s4493 from Thermo Fisher), each introduced separately into SH‐SY5Y‐APPSw cells. Cells were harvested and replated at a density sufficient to achieve ~80% confluence 72 h later. RNAiMax (Thermo Fisher) was used as the transfection reagent, following the manufacturer's protocol. MISSION shRNA universal negative controls (SIC001 and SIC002, Millipore‐Sigma) were transfected by the same protocol, as negative controls for the EEF‐2 knockdowns. Cells were harvested and flash frozen 72 hours after transfection.

Isolation of sarkosyl‐insoluble aggregates

Aggregates were prepared from Alzheimer's Disease (AD) vs. age‐matched control (AMC) hippocampus; T98G/APOE3 or T98G/APOE4 human glial cell pellets; or SY5Y‐APPSw human neuroblastoma cell pellets. Frozen tissues or cells were pulverized with a mortar and pestle (cooled on dry ice) and suspended in lysis buffer containing 20‐mM HEPES pH 7.4, 0.3‐M NaCl, 2‐mM MgCl2, 1% NP40 (w/v), supplemented with phosphatase and protease inhibitors (CalBiochem). Tissue suspensions were lysed in a Teflon homogenizer (2 times 10 s, at 0°C) and sonicated (3 times 10 s, at 0°C). Samples were centrifuged 5 min at 600 × g to remove debris. Supernatant protein was quantified and each sample (0.6–1.0 mg) was centrifuged 15 min at 13,000 × g. Supernatants (soluble protein) were removed, and to each insoluble pellet the same lysis buffer was added plus 1% (v/v) sarkosyl, and mixed well. Samples were centrifuged 20 min at 100,000  × g; supernatants and pellets were recovered as “sarkosyl‐soluble aggregates” and “sarkosyl‐insoluble aggregates”, respectively.

Immunoprecipitation of amyloid beta and tau aggregates

AD and AMC hippocampal tissue samples were pulverized as described above. After removal of debris (centrifugation for 5 min at 1400 × g), protein was quantified by the Bradford protein assay. Protein was then gently mixed with magnetic beads coated with antibody to either Aβ1–42 (ab11132) or tau (ab80579) for immuno‐pulldown (IP); sarkosyl‐insoluble protein was isolated from the antibody‐bound fractions as described previously (Ayyadevara, Balasubramaniam, Parcon, et al., 2016).

Aggregate contactome generation

Insoluble aggregates isolated from SY5Y‐APPSw cells as above, were cross‐linked following procedures described previously (Balasubramaniam et al., 2019). In brief, purified aggregates were rinsed, cross‐linked with modified click reagents, digested with trypsin, and the linked peptide pairs were affinity purified using streptavidin‐coated beads to capture the biotin‐coupled crosslinking moiety. Cross‐linked peptide pairs were identified from high‐resolution LC/MS‐MS raw data files, using a modified version of Xlink identifier (Balasubramaniam et al., 2019; Du et al., 2011). Xlink identifier outputs were analyzed with the GePhi software package to calculate the degree (number of interacting partners) of each hub. Because high‐molecular‐weight proteins (e.g., titin) have greater potential to interact with other proteins, spectral hits for each hub were normalized, i.e. divided by the length of that hub protein in amino acids. Identified contactome proteins were categorized by degree, as described previously (Balasubramaniam et al., 2019). Proteins with a high normalized degree (number of interacting partners divided by length in amino acids) or classified as hub‐connectors (connecting 2 or more hub proteins that are not otherwise connected) were pursued by further graph modeling; the Cytoscape package (Shannon et al., 2003) was used with default parameters to construct and visualize graphs.

Isolation and quantitation of nucleic acids in aggregates

For sequencing of nucleic acid fragments from isolated aggregates, RNA and DNA were extracted from sarkosyl‐insoluble material isolated from cultured cells, or from AD and AMC hippocampus, using the Qiagen AllPrep kit following manufacturer's instructions and a protocol in which this kit was shown to recover even small nucleic acid fragments (Pena‐Llopis & Brugarolas, 2013). To quantify DNA and RNA trapped in sarkosyl‐insoluble aggregates, nucleic acids were extracted and assayed by multiple protocols, with consistent results. These consisted of (1) separation of RNA and DNA fragments using a Qiagen AllPrep DNA/RNA extraction kit according to the manufacturer's protocol, with recovery assayed by absorbance at 260 nm; (2) separation of RNA and DNA fragments with the Qiagen kit, and quantitation by ethidium bromide and/or SYBR Gold after resolution by acrylamide gel electrophoresis; (3) selective enzymatic digestion with RNAse‐free DNAse (Thermo Fisher, CA) and assay by 260‐nm absorption (RNA directly; DNA by subtraction), and (4) using TRI Reagent (Molecular Research Ctr., TR118) to isolate RNA, DNA, and protein in a single protocol. Figure 2 data were obtained by method (3) above.

RNA‐seq and ChIP‐seq analyses

All RNA‐seq and ChIP‐seq analyses were performed by the UT Southwestern Genomics Core, analyzed using the CLC Genomics Workbench. We employed ChIP‐seq to evaluate DNA‐fragment specificity; thus, the primary analytic value is the number of significant peaks, with peak validity assessed by an E value relative to a flat distribution (peak absence). RNA‐seq was preceded by peak validation, just as for ChIP‐seq. Subsequently, valid‐peak reads that map uniquely to exons (“unique exon reads”) were summed as our expression metric, and were used to determine differential expression between groups. The nucleic acid contig assemblies were quite consistent in size, 579 ± 34 (SD) base pairs in length for DNA peaks, and 291 ± 31 (SD) for RNA‐fragment contigs. The efficiency of ChIP‐seq and RNA‐seq fragment cloning protocols, employed prior to sequencing, is quite sensitive to fragment size. Under normal ChIP‐seq protocols, they would be determined by shearing or sonication, size selection by cloning vector, and/or manual size selection. However, in the case of aggregate nucleic acid fragments, other factors may be influential—such as the size, age, and intracellular location of individual aggregates.

Viral sequence analysis

We employed a modified version of VirTect to scan DNA and RNA fragment sequences from human AD and AMC (age‐matched control) hippocampi. VirTect is a pipeline script that calls a sequence of RNA‐seq pattern‐matching routines (Khan et al., 2019). VirTect retrievals of viral matches to aggregate nucleic acid reads, from 3 AD and 3 AMC brain samples, were filtered using the following parameters (https://github.com/WGLab/VirTect/blob/master/README.md): ≥5x coverage depth, a continuous/contiguous region cutoff of ≥75, and a read count ≥50. Several protocol modifications were made for our pipeline: (1) tophat2 was replaced by hisat2; (2) code was optimized to use all available threads; (3) the internal threshold number of reads was reduced in exploratory runs for the purpose of obtaining AD/AMC read ratios, but recommended thresholds were maintained to eliminate false positives in the assignment of valid hits; (4) the modified script was rewritten in Bash, with unnecessary subroutine calls deleted to reduce run‐time. The database screened by VirTect comprises complete sequences of 757 viruses, as described (Khan et al. 2019). Target sequences were not restricted to human viruses, in recognition of the high frequency of zoonoses and multiple‐host pathogens.

G‐quadruplex analyses

We employed two programs to screen RNA and DNA sequences for G‐quadruplex‐forming regions: (Doluca, 2019) and (Kikin et al., 2006). Both strands were scanned for each DNA‐fragment sequence, but only strands with G4‐forming potential were pursued in subsequent analyses. The following parameters were used for G4CatchAll: G3L (loop limit) was set to 1.3; G2L (allowing 2‐G loops) was set to 1.3; G4H (enables the G4Hunter algorithm for final evaluation). The following parameters were used for QGRS Mapper: Max. Length 30; Min G‐Group 2; Loop size 0 – 36.

Statistical analyses

Inter‐group differences were tested for significance by 2‐tailed Behrens–Fisher heteroscedastic t tests, unless otherwise indicated. These conservative tests are appropriate to small‐sample comparisons in which the intra‐group variance is not well estimated. Comparisons of ratios generally employed Yates chi2 (chi‐squared) nondirectional tests, substituting 2‐tailed Fisher Exact tests as required to meet numerical constraints. This conservative replacement is stated in the text but is not made explicit (line by line) in tables to conserve space.

CONFLICTS OF INTEREST

The authors declare no competing or conflicting interests.

AUTHOR CONTRIBUTIONS

RJSR and SA designed the study; MB and SA performed aggregate cross‐linking studies; MB undertook subsequent data analysis and network modeling; MB and AG performed additional proteomics data analyses and contactome construction; JJ undertook G‐quadruplex and human‐virus searches, and checked all RNA and DNA sequencing data analyses (based on data and initial CLC bio/Qiagen analyses provided by the Genomics Core facility, University of Texas Southwest Medical Center); RA and SA performed EEF2‐shRNA knockdown experiments, and RNA and DNA recovery quantitations; RJSR wrote the manuscript with contributions from SA and MB. All authors read and approved the final manuscript. Fig S1‐S3 Click here for additional data file. Table S1 Click here for additional data file.
  72 in total

Review 1.  An emerging role of RNA-binding proteins as multifunctional regulators of lymphocyte development and function.

Authors:  Martin Turner; Daniel J Hodson
Journal:  Adv Immunol       Date:  2012       Impact factor: 3.543

Review 2.  Pathological implications of nucleic acid interactions with proteins associated with neurodegenerative diseases.

Authors:  Yraima Cordeiro; Bruno Macedo; Jerson L Silva; Mariana P B Gomes
Journal:  Biophys Rev       Date:  2014-01-09

3.  Quantitative relationships between huntingtin levels, polyglutamine length, inclusion body formation, and neuronal death provide novel insight into huntington's disease molecular pathogenesis.

Authors:  Jason Miller; Montserrat Arrasate; Benjamin A Shaby; Siddhartha Mitra; Eliezer Masliah; Steven Finkbeiner
Journal:  J Neurosci       Date:  2010-08-04       Impact factor: 6.167

4.  Neurotropic viruses and Alzheimer's disease: a search for varicella zoster virus DNA by the polymerase chain reaction.

Authors:  W R Lin; I Casas; G K Wilcock; R F Itzhaki
Journal:  J Neurol Neurosurg Psychiatry       Date:  1997-06       Impact factor: 10.154

5.  Human endogenous retrovirus HERV-K113 is capable of producing intact viral particles.

Authors:  Klaus Boller; Kurt Schönfeld; Stefanie Lischer; Nicole Fischer; Andreas Hoffmann; Reinhard Kurth; Ralf R Tönjes
Journal:  J Gen Virol       Date:  2008-02       Impact factor: 3.891

6.  Proteomic analysis of age-dependent changes in protein solubility identifies genes that modulate lifespan.

Authors:  Pedro Reis-Rodrigues; Gregg Czerwieniec; Theodore W Peters; Uday S Evani; Silvestre Alavez; Emily A Gaman; Maithili Vantipalli; Sean D Mooney; Bradford W Gibson; Gordon J Lithgow; Robert E Hughes
Journal:  Aging Cell       Date:  2011-12-11       Impact factor: 9.304

7.  Alteration of dynein function affects α-synuclein degradation via the autophagosome-lysosome pathway.

Authors:  Da Li; Ji-Jun Shi; Cheng-Jie Mao; Sha Liu; Jian-Da Wang; Jing Chen; Fen Wang; Ya-Ping Yang; Wei-Dong Hu; Li-Fang Hu; Chun-Feng Liu
Journal:  Int J Mol Sci       Date:  2013-12-13       Impact factor: 5.923

Review 8.  Could autophagy dysregulation link neurotropic viruses to Alzheimer's disease?

Authors:  Maria Anele Romeo; Alberto Faggioni; Mara Cirone
Journal:  Neural Regen Res       Date:  2019-09       Impact factor: 5.135

9.  Apolipoprotein E4 inhibits autophagy gene products through direct, specific binding to CLEAR motifs.

Authors:  Paul A Parcon; Meenakshisundaram Balasubramaniam; Srinivas Ayyadevara; Richard A Jones; Ling Liu; Robert J Shmookler Reis; Steven W Barger; Robert E Mrak; W Sue T Griffin
Journal:  Alzheimers Dement       Date:  2017-09-22       Impact factor: 21.566

Review 10.  Potential role of viruses in neurodegeneration.

Authors:  K Kristensson
Journal:  Mol Chem Neuropathol       Date:  1992 Feb-Apr
View more
  3 in total

1.  Probing TDP-43 condensation using an in silico designed aptamer.

Authors:  Elsa Zacco; Owen Kantelberg; Edoardo Milanetti; Alexandros Armaos; Francesco Paolo Panei; Jenna Gregory; Kiani Jeacock; David J Clarke; Siddharthan Chandran; Giancarlo Ruocco; Stefano Gustincich; Mathew H Horrocks; Annalisa Pastore; Gian Gaetano Tartaglia
Journal:  Nat Commun       Date:  2022-06-23       Impact factor: 17.694

Review 2.  Emerging Roles of RNA-Binding Proteins in Neurodevelopment.

Authors:  Amalia S Parra; Christopher A Johnston
Journal:  J Dev Biol       Date:  2022-06-10

3.  Machine-learning analysis of intrinsically disordered proteins identifies key factors that contribute to neurodegeneration-related aggregation.

Authors:  Akshatha Ganne; Meenakshisundaram Balasubramaniam; Srinivas Ayyadevara; Robert J Shmookler Reis
Journal:  Front Aging Neurosci       Date:  2022-08-03       Impact factor: 5.702

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.