Petr Klus1,2, Davide Cirillo1,2, Teresa Botta Orfila1,2, Gian Gaetano Tartaglia1,2,3. 1. Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, 08003 Barcelona, Spain. 2. Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain. 3. Institució Catalana de Recerca i Estudis Avançats (ICREA), 23 Passeig Lluís Companys, 08010 Barcelona, Spain.
Abstract
It has been reported that genes up-regulated in cancer are often down-regulated in neurodegenerative disorders and vice versa. The fact that apparently unrelated diseases share functional pathways suggests a link between their etiopathogenesis and the properties of molecules involved. Are there specific features that explain the exclusive association of proteins with either cancer or neurodegeneration? We performed a large-scale analysis of physico-chemical properties to understand what characteristics differentiate classes of diseases. We found that structural disorder significantly distinguishes proteins up-regulated in neurodegenerative diseases from those linked to cancer. We also observed high correlation between structural disorder and age of onset in Frontotemporal Dementia, Parkinson's and Alzheimer's diseases, which strongly supports the role of protein unfolding in neurodegenerative processes.
It has been reported that genes up-regulated in cancer are often down-regulated in neurodegenerative disorders and vice versa. The fact that apparently unrelated diseases share functional pathways suggests a link between their etiopathogenesis and the properties of molecules involved. Are there specific features that explain the exclusive association of proteins with either cancer or neurodegeneration? We performed a large-scale analysis of physico-chemical properties to understand what characteristics differentiate classes of diseases. We found that structural disorder significantly distinguishes proteins up-regulated in neurodegenerative diseases from those linked to cancer. We also observed high correlation between structural disorder and age of onset in Frontotemporal Dementia, Parkinson's and Alzheimer's diseases, which strongly supports the role of protein unfolding in neurodegenerative processes.
It has been reported that tumor suppressor p53 has physico-chemical features that are typical of prionoid proteins associated with neurodegenerative diseases1. This finding is particularly interesting because it suggests that common molecular properties can be linked to relatively distant diseases. As a matter of fact, a recent study by Ibáñez et al.2 shows that transcripts up-regulated in cancer are down-regulated in central nervous system (CNS) diseases and vice versa. In line with this finding, a risk reduction for some cancer types has been observed in patients affected by Parkinson’s3 and Alzheimer’s diseases4.Are there common physico-chemical determinants behind comorbidities? We re-analysed the data published by Ibáñez et al.2 to understand if differential regulation of genes can be associated with specific protein features. While the original study by Ibáñez et al.2 focused on transcripts that are exclusively up-regulated in cancer and down-regulated in CNS diseases and vice versa2, our analysis deals with genes that are exclusively associated with either CNS diseases or cancer (Fig. 1). In agreement with recent experimental findings5 and theoretical analyses6789, we investigated the physico-chemical properties of gene products assuming a proportionality between transcript and protein abundances.
Figure 1
Gene sets analysis.
Previous analysis carried out by Ibáñez et al.2 focused on transcripts that are up-regulated in central nervous system (CNS) and down-regulated in cancer or vice versa (i.e., intersection between gene sets). Our study deals instead with sets of genes that are either up-regulated or down-regulated in cancer and CNS diseases (i.e., symmetric difference between gene sets).
Results
In this work, we used the cleverMachine approach (available at http://www.tartaglialab.com/cs_multi/submission)10 to analyse physico-chemical features of proteins associated with Schizophrenia, Alzheimer’s and Parkinson’s diseases as well as colorectal, lung and prostate cancers. Analysis carried out with the boxplotter algorithm (accessible at http://www.tartaglialab.com/boxplotter/submit; Table S1) reveals that genes up-regulated in CNS disorders code for proteins that are poorly abundant at physiological conditions (human reference proteome)1112, indicating that expression is significantly increased in the disease state (down-regulated genes follow the opposite trend; Fig. 2A–C). By contrast, genes up-regulated in colorectal, lung and prostate cancers are associated with proteins that are already abundant in the reference proteome (down-regulated genes follow the opposite trend; Fig. 2D–F). The finding that genes associated with different diseases are constitutively expressed at specific levels suggests a link with physico-chemical features of their product products813. As a matter of fact, previous reports indicate that protein abundance is intrinsically constrained by solubility891415, unfolded polypeptides are poorly expressed1617 and nucleic-acid binding proteins are highly abundant1819 (Table S1).
Figure 2
Expression of CNS and cancer genes at physiological conditions.
Genes up-regulated (UP) in (A) Alzheimer’s, (B) Parkinson’s diseases and (C) Schizophrenia encode proteins that are poorly abundant under normal conditions1112, while down-regulated genes (DOWN) show the opposite trend. Genes up-regulated (UP) in (D) Colorectal, (E) Lung and (F) Prostate cancer encode proteins that are highly abundant in normal conditions, while down-regulated genes (DOWN) show the opposite trend. As physiological concentrations of proteins are linked to their physico-chemical properties91619, our findings reveal information on intrinsic features of disease-associated genes. The p-values are calculated with Kolmogorov-Smirnov test.
We found that structural disorder strongly differentiates cancer types and CNS diseases (p-values <10−5; http://www.tartaglialab.com/cs_multi/confirm/524/36563b35ee/). Evidence for this conclusion is presented in Fig. 3, where we compared 18000 genes (~75000 protein isoforms) using ten disorder predictors10. For each CNS disease, we found that up-regulated genes are significantly enriched in intrinsically unfolded proteins (17 out of 18 of protein sets follow the trend giving an overall signal strength of 17/18 = 0.94; p-values <10−5; Fisher’s exact test; Fig. 3A), while down-regulated genes contain more structured polypeptides (signal strength = 18/18), in agreement with DisEMBL disorder predictions20 (see Material and Methods). Comparing genes up- and down-regulated in cancer types and CNS diseases, we observed that structural disorder propensity anti-correlates with order-promoting features such as alpha-helix (31 out of 36 predictors show opposite trends resulting in a score of −31/36 = −0.86) and beta-sheet (−0.91) propensities. Increase in disorder is also significantly associated with depletions in burial (predictors agreement = −0.77), hydrophobicity (−0.55) and membrane propensities (−0.47)21. By contrast, proteins up-regulated in colorectal and lung cancer are enriched in nucleic-acid binding ability (8 out of 12 sets follow the trend, while the remaining 4/12 do not show significant enrichments; Fig. 3B), which is in line with evidence showing that transcription factors such as p53 play a major role in oncogenesis22. Interestingly, prostate cancer shows significant up-regulation of membrane proteins (e.g. NGEP-L), as previously reported in other studies (3 of 6 sets follow the same trend, while the remaining 3/6 do not show significant enrichments; Fig. 3C)23.
Figure 3
Physico-chemical properties of proteins involved in cancers and CNS diseases.
(A) Up-regulation of structurally disordered proteins discriminates between cancer types and central nervous system (CNS) diseases. As indicated by horizontal arrows, proteins up-regulated in CNS are enriched in structural disorder (red dots; down-regulation is associated with the opposite trend); (B) Nucleic-acid binding propensity differentiates CNS diseases from and proteins up-regulated in colorectal and lung cancer. Proteins up-regulated in colorectal and lung cancer (vertical arrows; green dots) have increased nucleic acid propensity (down-regulation is associated with decrease). (C) Membrane propensity differentiates between CNS diseases and proteins up-regulated in prostate cancer. Genes up-regulated in prostate cancer show increased membrane propensity (vertical arrow; green dots; down-regulation is associated with opposite trend). Red: a particular CNS disease is enriched with respect to a cancer type in structural disorder (A), nucleic-acid binding propensity (B) or membrane propensity (C); Green: a cancer type is enriched with respect to a particular CNS disease in structural disorder (A), nucleic-acid binding propensity (B) or membrane propensity (C); Yellow: non significant enrichment; Each enrichment is associated with a p-value < 10−5 calculated with Fisher’s exact test; AD: Alzheimer’s disease; PD: Parkinson’s disease; SCZ: Schizophrenia; CRC: Colorectal cancer; LC: Lung cancer; PC: Prostate cancer; UP/DOWN: over/under-expression with respect to healthy control samples.
Gene Ontology (GO) analysis of up-regulated genes indicates that proteins containing disordered regions are associated with increased aggregation (Alzheimer’s disease: “identical protein binding”, p-value = 10−5) and misfolding propensities (Parkinson’s disease: “activation of signaling protein activity involved in unfolded protein response”, p-value = 10−4; Fig. 4 Schizophrenia: “response to unfolded protein”, p-value=10−3). Interestingly, a group of disordered proteins with DNA-/RNA-binding ability is up-regulated in colorectal (“RNA processing” p-value = 10−9), lung (“DNA repair” p-value = 10−4) and prostate cancers (“ribonucleoprotein complex” p-value = 10−5). In addition, disordered proteins are found in pathways involving p53 (e.g. colorectal cancer: “DNA damage response, signal transduction by TP53 class mediator resulting in cell cycle arrest”, p-value = 10−5).
Figure 4
Protein disorder is linked to neurodegeneration.
Intrinsically disordered proteins are associated with Gene Ontology (GO) labels that are significantly enriched (p-value < 10−4) in terms such as “unfolded protein response” (the example shown refers to Parkinson’s disease genes).
GO annotations suggest that proteins containing disordered regions are abundant in colorectal, lung and prostate cancers, although their enrichment is less significant than in Schizophrenia, Alzheimer’s and Parkinson’s diseases. To test this hypothesis, we generated random groups of human genes (same number of proteins as in the original sets) and compared their features with those of cancers and CNS diseases. We found that structural disorder is indeed enriched in both up-regulated and down-regulated cancer proteins (19 out of 36 down- and up-regulated sets follow the trend and 13/16 do not show significant enrichments; p-values < 10−5; Figure S1A), although the signal is stronger for Schizophrenia, Alzheimer’s and Parkinson’s diseases (18/18 up-regulated gene sets are enriched in disorder and 16/18 down-regulated sets are depleted; Figure S1B), in agreement with our original findings (Fig. 3A). We also observed that nucleic acid propensities are enriched in cancers (15/18 sets show significant increase and three are non-significantly enriched) and CNS diseases (15/18 sets have significant increase and one is non-significantly enriched), but signal strength is higher for cancers (Fig. 3B).To further investigate the intimate connection between CNS diseases and structural disorder, we analysed 428 mutations of proteins involved in Frontotemporal Dementia, Alzheimer’s and Parkinson’s diseases (available at http://www.molgen.ua.ac.be/ADMutations/ and http://www.molgen.vib-ua.be/PDMutDB/). We observed a strong correlation (Pearson’s correlation = −0.9; p-value < 10−3) between age of onset and disorder24, which, in agreement with GO analysis, indicates that reduction in folding efficiency is a key factor in neurodegeneration (Fig. 5). In line with this observation, previous reports indicate that intrinsically unfolded proteins such as α-synuclein (Parkinson’s disease25), Aβ42 (Alzheimer’s disease26) and DISC1 (Schizophrenia27) cause neuronal damages by assembling into amyloid fibrils. As proteomic analyses indicate that amyloid-forming proteins have an intrinsic propensity to attract disordered proteins26, it is possible that neurotoxicity arises from direct co-aggregation of proteins that have unfolded regions available for promiscuous interactions. Thus, up-regulation of disordered proteins might be the consequence of a cellular response to compensate progressive sequestration in amyloid deposits. To investigate this hypothesis, we compared proteins sequestered by amyloid fibrils26 and those deregulated in Alzheimer’s disease. The cleverMachine analysis10 indicates that proteins binding to amyloid aggregates are not physico-chemically dissimilar to those up-regulated in Alzheimer’s disease (see http://www.tartaglialab.com/cs_multi/cc_runs/622/; Figure S3), which strongly tightens the link between misfolding and neurodegeneration. In line with this findings, very recent reports showed that increase in protein insolubility is associated with massive accumulation of natively unfolded proteins28.
Figure 5
Structural disorder is associated with onset of neurodegenerative diseases.
In Frontotemporal Dementia, Alzheimer’s and Parkinson’s diseases, structural disorder is significantly anti-correlated with age of onset (correlation = −0.90; p-value < 10−3). A total of 428 mutations and their relative ages of onset grouped with a 2.5 years window have been used for the analysis. Representative genes have been selected to illustrate individual trends (other genes are shown in black): APP, CHMP2B (red), FUS, GRN, LRRK2 (blue), MAPT, PARK2 (yellow), PARK7, PINK1, PSEN1 (gray), PSEN2 (purple), SNCA, TARDBP (pink) and VCP.
Conclusions
It has been shown that structurally disordered proteins are tightly regulated by the cell2930 and their uncontrolled over-expression triggers pathological conditions such as for instance cardiovascular diseases and diabetes31. In this study, we reported the finding that genes up-regulated in CNS diseases are more enriched in disordered protein products than cancer genes, which has important implications for the etiopathogenesis of neurodegenerative diseases. As a matter of fact, changes in the abundance of unfolded proteins induce re-wiring of protein networks and promote formation of aberrant interactions32 leading to association with amyloid deposits26. As genes up-regulated in prostate, colorectal and lung cancer code for proteins that are less disordered than those up-regulated in CNS diseases and more unfolded than those down-regulated in CNS diseases, we cannot exclude the possibility that structural disorder might play a role in cancer, although to a lesser extent. Indeed, unregulated promiscuity of unfolded proteins can trigger fatal events leading to cell death signalling29. For instance, in the case of the Bcl-2 family of apoptosis regulators, aberrant expression of intrinsically disordered proteins can determine different cell fate decisions through alteration of interaction networks33 (we note that Bcl-2 is up-regulated in CNS disorders and down-regulated in cancer2).Our results do not indicate that aggregation is uniquely linked to neurodegeneration. Indeed, although amyloid fibrils sequester natively unfolded proteins26, which are particularly abundant in brain regions3435, some cancer types are associated with protein aggregation36 and protein deposits influence cell survival in the context of several tumors, especially those that are metastatic. For example, co-aggregation of toxic amyloid-β peptide (Aβ) and TGF-β-induced antiapoptotic factor (TIAF1) is a hallmark of metastatic cancer cell mass3738. Expression levels of TIAF1 vary throughout the metastatic spread, being up-regulated in developing tumors and down-regulated in established metastatic cancer cells37. In a number of cases, aggregation of specific genes is associated with both CNS diseases and cancer types. For instance, aggregation of superoxide dismutase SOD1 causes cellular death in amyotrophic lateral sclerosis39. Yet, SOD1 has also a role in breast cancer and an ability to augment estrogen-responsive gene expression40. Similarly, DNA-binding domain of p53 is conformationally unstable and the majority of disease mutants are known to increase structural disorder41. Upon aggregation, mutant p53 not only induces misfolding and co-aggregation of wild-type p53, but also of its paralogues p63 and p73 into cellular inclusions, causing inefficient transcription of target genes, which, in turn, is crucial for cell growth control and apoptosis42.In conclusion, our analysis is one of the first attempts to illustrate how an epidemiological observation on inverse comorbities2 can be rationalized in terms of physico-chemical features of proteins encoded by deregulated genes. We cannot exclude that additional factors, including age of disease onset and drug treatment, could influence the expression patterns associated with disease. As a matter of fact, drugs used in the treatment of neurodegenerative diseases, such as for instance thioridazine43, have been shown to display anti-tumor effects while anti-tumor drugs, such as cyclin-dependent kinase inhibitors44 and mithramycin45 are neuro-protective. Yet, these findings reinforce the existence of a link between cancer and CNS diseases and indicate that future studies will have to focus on specific molecular pathways46.
Materials and Methods
Gene sets were taken from the paper by Ibáñez et al.2: Alzheimer’s disease (AD); Parkinson’s disease (PD); Schizophrenia (SCZ); Colorectal cancer (CRC); Lung cancer (LC); Prostate cancer (PC). Results can be accessed at http://www.tartaglialab.com/cs_multi/confirm/524/36563b35ee/. Examples of our calculations are at http://www.tartaglialab.com/cs_multi/confirm/240/6be82069c3/. Comparison with random sets can be found at http://www.tartaglialab.com/cs_multi/confirm/576/ef217f98eb/ (CNS diseases) and http://www.tartaglialab.com/cs_multi/confirm/602/cfc3e02cdc/ (cancers). Classification of disordered proteins interacting with amyloid fibrils is available at http://www.tartaglialab.com/cs_multi/cc_runs/622/.
cleverMachine
The cleverMachine (CM) algorithm analyses physico-chemical properties of two protein datasets10. The tool creates profiles, or physico-chemical signatures, for each protein, utilizing a large set of features - both experimentally and statistically derived from other tools. In our analysis we used a number of physico-chemical properties (hydrophobicity, alpha-helix, beta-sheet, disorder, burial, aggregation, membrane and nucleic acid-binding propensities) and 10 propensity predictors per feature. Only differentially enriched properties were used in the calculations. Further information can be found at http://s.tartaglialab.com/page/clever_suite.
multiCleverMachine analysis
The multiCleverMachine (multiCM) extends the concept of binary comparisons used in CM by introducing more set groups. After submission of one or more inputs for signal and one or more inputs as negative group, the multiCM creates a CM run for each possible combination of elements from the signal and negative sets. The result is presented in an easy-to-read format, allowing at a glance interpretation of the CM submissions (Fig. 1). Each of the individual CM runs is linked on the multiCM page, allowing further in-depth analysis. The multiCM provides visualisation of enrichment strengths per group, enabling to see easily for which groups the various properties like disorder, alpha-helical propensity, etc. are enriched. Details about this new method are available at http://www.tartaglialab.com/cs_multi/submission.
DisEMBL analysis
In order to validate our CM analysis, we used DisEMBL20 (http://dis.embl.de). As DisEMBL provides disorder profiles for each of the properties, the analysis was carried out as follows. For each of the profiles, we calculated proportion of the sequence that was above the significance threshold defined by the authors, which yielded strength score for each individual entry. The scores were then averaged to compare individual sets. To visualize strength comparisons, we use the same set of colors as described in Fig. 1 (see multiCleverMachine analysis): if the set on the left (cancer) has enrichment, the color is green and red otherwise. Our results are available at http://www.tartaglialab.com/static/2014/disembl_analysis.html.
Age of onset analysis
We downloaded all single-point amino acid mutations and associated ages of onset from http://www.molgen.ua.ac.be/ADMutations/ and http://www.molgen.vib-ua.be/PDMutDB/. Structural disorder was measured using the B-value propensity scale (linearly normalized between 0 and 1)24. For each protein in the dataset, we averaged the disorder propensity over the sequence, as described in our previous publication (values > 0.2)10. The relationship between age of onset (AGE) and structural disorder (SD) was assessed with the sigmoidal function using Z-normalized values for SD (; correlation = −0.90; Fig. 3). Using linear regression, , the correlation between SD and AGE is −0.87.
Additional Information
How to cite this article: Klus, P. et al. Neurodegeneration and Cancer: Where the Disorder Prevails. Sci. Rep.
5, 15390; doi: 10.1038/srep15390 (2015).
Authors: Gian Gaetano Tartaglia; Sebastian Pechmann; Christopher M Dobson; Michele Vendruscolo Journal: Trends Biochem Sci Date: 2007-04-06 Impact factor: 13.807
Authors: Jiangang Liu; Narayanan B Perumal; Christopher J Oldfield; Eric W Su; Vladimir N Uversky; A Keith Dunker Journal: Biochemistry Date: 2006-06-06 Impact factor: 3.162
Authors: P B Stathopulos; J A O Rumfeldt; G A Scholz; R A Irani; H E Frey; R A Hallewell; J R Lepock; E M Meiering Journal: Proc Natl Acad Sci U S A Date: 2003-05-28 Impact factor: 11.205
Authors: Tapan K Bera; Sudipto Das; Hiroshi Maeda; Richard Beers; Curt D Wolfgang; Vasantha Kumar; Yoonsoo Hahn; Byungkook Lee; Ira Pastan Journal: Proc Natl Acad Sci U S A Date: 2004-02-23 Impact factor: 11.205
Authors: Frank Desiere; Eric W Deutsch; Nichole L King; Alexey I Nesvizhskii; Parag Mallick; Jimmy Eng; Sharon Chen; James Eddes; Sandra N Loevenich; Ruedi Aebersold Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971
Authors: Michele Salemi; Maria Paola Mogavero; Giuseppe Lanza; Laura M Mongioì; Aldo E Calogero; Raffaele Ferri Journal: Cells Date: 2022-06-15 Impact factor: 7.666
Authors: Michail Yu Lobanov; Petr Klus; Igor V Sokolovsky; Gian Gaetano Tartaglia; Oxana V Galzitskaya Journal: Sci Rep Date: 2016-06-03 Impact factor: 4.379
Authors: Domenica Marchese; Natalia Sanchez de Groot; Nieves Lorenzo Gotor; Carmen Maria Livi; Gian G Tartaglia Journal: Wiley Interdiscip Rev RNA Date: 2016-08-08 Impact factor: 9.957