Stephanie A Zlatic1, Duc Duong2, Kamal K E Gadalla3, Brenda Murage3, Lingyan Ping2, Ruth Shah4, James J Fink5, Omar Khwaja6, Lindsay C Swanson6, Mustafa Sahin6, Sruti Rayaprolu7, Prateek Kumar7, Srikant Rangaraju7, Adrian Bird4, Daniel Tarquinio8, Randall Carpenter9, Stuart Cobb3, Victor Faundez1. 1. Departments of Cell Biology, Emory University, Atlanta, GA 30322, USA. 2. Departments of Biochemistry, Emory University, Atlanta, GA 30322, USA. 3. Simons Initiative for the Developing Brain, Centre for Discovery Brain Sciences, University of Edinburgh, Edinburgh EH8 9XD, UK. 4. The Wellcome Centre for Cell Biology, University of Edinburgh, Michael Swann Building, King's Buildings, Max Born Crescent, Edinburgh EH9 3BF, UK. 5. Q-State Biosciences, Cambridge, MA 02139, USA. 6. Department of Neurology, Rosamund Stone Zander Translational Neuroscience Center, Boston Children's Hospital, Boston, MA 02115, USA. 7. Departments of Neurology, Emory University, Atlanta, GA 30322, USA. 8. Center for Rare Neurological Diseases, Norcross, GA 30093, USA. 9. Rett Syndrome Research Trust, Trumbull, CT 06611, USA.
Abstract
MECP2 loss-of-function mutations cause Rett syndrome, a neurodevelopmental disorder resulting from a disrupted brain transcriptome. How these transcriptional defects are decoded into a disease proteome remains unknown. We studied the proteome of Rett cerebrospinal fluid (CSF) to identify consensus Rett proteome and ontologies shared across three species. Rett CSF proteomes enriched proteins annotated to HDL lipoproteins, complement, mitochondria, citrate/pyruvate metabolism, synapse compartments, and the neurosecretory protein VGF. We used shared Rett ontologies to select analytes for orthogonal quantification and functional validation. VGF and ontologically selected CSF proteins had genotypic discriminatory capacity as determined by receiver operating characteristic analysis in Mecp2 -/y and Mecp2 -/+ . Differentially expressed CSF proteins distinguished Rett from a related neurodevelopmental disorder, CDKL5 deficiency disorder. We propose that Mecp2 mutant CSF proteomes and ontologies inform putative mechanisms and biomarkers of disease. We suggest that Rett syndrome results from synapse and metabolism dysfunction.
MECP2 loss-of-function mutations cause Rett syndrome, a neurodevelopmental disorder resulting from a disrupted brain transcriptome. How these transcriptional defects are decoded into a disease proteome remains unknown. We studied the proteome of Rett cerebrospinal fluid (CSF) to identify consensus Rett proteome and ontologies shared across three species. Rett CSF proteomes enriched proteins annotated to HDL lipoproteins, complement, mitochondria, citrate/pyruvate metabolism, synapse compartments, and the neurosecretory protein VGF. We used shared Rett ontologies to select analytes for orthogonal quantification and functional validation. VGF and ontologically selected CSF proteins had genotypic discriminatory capacity as determined by receiver operating characteristic analysis in Mecp2 -/y and Mecp2 -/+ . Differentially expressed CSF proteins distinguished Rett from a related neurodevelopmental disorder, CDKL5 deficiency disorder. We propose that Mecp2 mutant CSF proteomes and ontologies inform putative mechanisms and biomarkers of disease. We suggest that Rett syndrome results from synapse and metabolism dysfunction.
The cellular and molecular understanding of neurodevelopmental disorders has been greatly advanced by the study of single gene defects (Sztainberg and Zoghbi, 2016; Lee et al., 2020). Among these monogenic neurodevelopmental disorders, Rett syndrome, caused by mutations in MECP2, stands out because of its severity and developmental regression. The molecular function of MECP2 as an epigenetic chromatin regulator is well defined (Lyst and Bird, 2015), affecting the expression of a vast number of RNAs in the brain (Lyst and Bird, 2015; Banerjee et al., 2019; Sandweiss et al., 2020; Johnson et al., 2017). The molecular complexity of Rett syndrome is compounded by the extensive and varied modifications of coding and non-coding transcriptomes across brain cell types and regions (Chahrour et al., 2008; Raman et al., 2018; Cholewa-Waclaw et al., 2019; Wu et al., 2010; Cheng et al., 2014). This fact makes transcriptional prediction of Rett syndrome proteomes a complex and uncertain endeavor. Thus, we focus on the proteome to identify biochemical and cellular alterations in Rett syndrome brains (Mullin et al., 2013). Moreover, known Rett syndrome phenotypes such as synaptic, circuit, and behavioral alterations ultimately originate in alterations of protein expression and function (Chao et al., 2007, 2010; Banerjee et al., 2019). Thus, the proteome decodes Mecp2-dependent transcriptional changes and executes a diseased phenome. Despite these advantages of the proteome to illuminate pathogenic mechanisms and to identify disease biomarkers, few studies examine how the proteome is modified in Rett syndrome brains (Matarazzo and Ronnett, 2004; Pacheco et al., 2017).Our goal was to identify a Rett proteome capable of distinguishing normal and disease brain states across neurodevelopment that is clinically accessible for sample collection, diagnostic testing, and assessing treatment outcomes. We focused on CSF as a clinically accessible sample whose composition is dictated by the brain and neurodevelopment (Chau et al., 2015; Kaiser et al., 2019; Lehtinen and Walsh, 2011). CSF carries neurodevelopmental instructive signals and nutrients and accrues secretions and metabolites that reflect functional states of diverse cell types in brain parenchyma, the choroid, and the ependymal cells (Johanson et al., 2008). For example, composition of the CSF proteome in individuals with Alzheimer’s predicts key diagnostic molecular pathology in the Alzheimer’s brain (Olsson et al., 2016; Higginbotham et al., 2020; Johnson et al., 2020). Thus, the CSF proteome has the potential to inform us about brain-wide normal and pathological states.The study of the CSF in genetic or sporadic forms of neurodevelopmental disorders lags behind similar studies in neurological diseases. Less than a handful of studies analyze the proteome of this biofluid (Abbasi et al., 2021). The study of the CSF in neurodevelopmental disorders has mostly been focused to targeted studies of few analytes such as cytokines, growth factors, neuropeptides, or metabolites (Zimmerman et al., 2005; Riikonen et al., 2006; Oztan et al., 2020, 2021; Budden et al., 1990; Matsuishi et al., 1994). Thus, we do not know if a neurodevelopmental disorder, such as Rett syndrome, reproducibly and distinctively modifies the CSF proteome to predict disease mechanisms and biomarkers of disease. Here, we address this fundamental question by comprehensively and unbiasedly exploring the CSF proteome of three species carrying mutations in MECP2/Mecp2. We defined a consensus proteome and ontologies predictive of Rett syndrome disease mechanisms. Analytes found in the Mecp2-sensitive proteome behaved with sufficient sensitivity and specificity to act as Rett syndrome biomarkers and to discriminate Rett syndrome CSF from a CDKL5 deficiency disorder CSF, a phenotypically related syndrome (Evans et al., 2005; Weaving et al., 2004; Zerbi et al., 2021). This is the first multispecies study of the CSF proteome in a monogenic disorder of neurodevelopment. Based on our proteomic analysis, we propose that Rett syndrome is a synaptic and metabolic neurodevelopmental disorder. Further, our experimental strategy offers a platform for the identification of proteomes and biomarkers in the CSF of any childhood genetic neurological disorder and to infer putative mechanisms of disease.
Results
The CSF proteome is composed of proteins secreted by conventional and non-conventional secretory pathways, exosomes, and ectosomes from brain (Kalluri and LeBleu, 2020). We collectively refer to these proteins as the brain secreted proteome. We sought to identify secreted proteins sensitive to MECP2/Mecp2 gene defects. To achieve this goal, we designed a multipronged strategy to quantify secreted proteomes from wild type and MECP2-null neuron conditioned media, cerebrospinal fluid (CSF) from Mecp2 null male rat and mouse models, and the CSF from female individuals with Rett syndrome collected before and after recombinant IGF1 treatment (Figure 1A) (O'Leary et al., 2018; Tropea et al., 2009). We reasoned that overlapping proteins and ontologies across diverse experimental systems would identify robust proteins and ontologies to inform putative disease mechanisms and Rett syndrome biomarkers. Furthermore, we designed our studies with an emphasis in replicability across sample collection and measurement experimental sites, quantification platforms, and species in order to inform biomarker selection (Figure 1A). We chose to quantify proteomes with tandem mass tagging (TMT) mass spectrometry as a high precision method (O'Connell et al., 2018; Li et al., 2012). TMT datasets were analyzed by fold of change/p value volcano thresholding plus machine learning approaches (Figure 1A).
Figure 1
The secreted proteome of post-mitotic MECP2 mutant human neurons
(A) Diagram shows three species and experimental systems. Cohorts represent independent collections of samples with the strategy used for analyte quantification, the place of sample collection, and sample measurement location. Isotopol refers to AQUA and modified AQUA strategies, LFQ corresponds to label free quantification, and TMT denotes tandem mass tagging.
(B and C) Silver stain of two formulations of media conditioned by wild type and MECP2-null differentiated LUHMES cells, a post-mitotic human neuron line (Scholz et al., 2011). Lane 1 in B and C represent naive non-conditioned media. Lanes 2 and 3 depict conditioned media by wild type and mutant cells. B presents experiments performed with commercial N2 supplement. C shows experiments where the N2 supplement was custom generated from high grade purity reagents.
(D–F) Present volcano plots of TMT mass spectrometry experiments with thresholds at log2 of 0.5-fold of change in protein abundance and a p value <0.05. Symbol color represents fold of change in linear scale (see insert). (D) presents a comparison between protein hits obtained by comparing media conditioned by wild type neurons and non-conditioned media. All hits to the right of the X axis correspond to proteins secreted by neurons (n = 5). (E) shows a comparison of the wild type and MECP2 mutant secreted proteome. All hits to the right correspond to proteins whose expression is higher in wild type than in mutant cells (n = 5). (F) shows the total cellular proteome of wild type and MECP2-null cells used in (E), n = 3.
(G–I) Show clustered heat maps of hits selected in D to F. Arrows mark some cardinal differentially expressed proteins. Rows are depicted as minimum and maximum intensities (blue-yellow scale) and annotated by log2 fold of change (rainbow scale, see Table S1).
(J–L) Analysis of TMT data in panel E using an XGBoost machine learning algorithm. J presents main hits discriminating wild type and MECP2 mutant conditioned media in the decision tree. Asterisks mark proteins identified both by volcano thresholding and machine learning. (K and L) performance of the machine learning protocol estimated by ROC analysis, (J) and confusion matrix in (I). Confusion matrix refers to the percentage of samples assigned to a genotype. Area under the curve in J =0.97 ± 0.12.
(M) Venn diagram of the overlap between hits found in conditioned media in panel E and cellular hits in F, p value calculated with exact hypergeometric probability.
(N) Shows the % overlap between the cellular proteome ontologies inferred from the datasets shown in (E and F) calculated with the ClueGo application. p value estimated with exact hypergeometric probability Bonferroni corrected.
See extended data in Table S1.
The secreted proteome of post-mitotic MECP2 mutant human neurons(A) Diagram shows three species and experimental systems. Cohorts represent independent collections of samples with the strategy used for analyte quantification, the place of sample collection, and sample measurement location. Isotopol refers to AQUA and modified AQUA strategies, LFQ corresponds to label free quantification, and TMT denotes tandem mass tagging.(B and C) Silver stain of two formulations of media conditioned by wild type and MECP2-null differentiated LUHMES cells, a post-mitotic human neuron line (Scholz et al., 2011). Lane 1 in B and C represent naive non-conditioned media. Lanes 2 and 3 depict conditioned media by wild type and mutant cells. B presents experiments performed with commercial N2 supplement. C shows experiments where the N2 supplement was custom generated from high grade purity reagents.(D–F) Present volcano plots of TMT mass spectrometry experiments with thresholds at log2 of 0.5-fold of change in protein abundance and a p value <0.05. Symbol color represents fold of change in linear scale (see insert). (D) presents a comparison between protein hits obtained by comparing media conditioned by wild type neurons and non-conditioned media. All hits to the right of the X axis correspond to proteins secreted by neurons (n = 5). (E) shows a comparison of the wild type and MECP2 mutant secreted proteome. All hits to the right correspond to proteins whose expression is higher in wild type than in mutant cells (n = 5). (F) shows the total cellular proteome of wild type and MECP2-null cells used in (E), n = 3.(G–I) Show clustered heat maps of hits selected in D to F. Arrows mark some cardinal differentially expressed proteins. Rows are depicted as minimum and maximum intensities (blue-yellow scale) and annotated by log2 fold of change (rainbow scale, see Table S1).(J–L) Analysis of TMT data in panel E using an XGBoost machine learning algorithm. J presents main hits discriminating wild type and MECP2 mutant conditioned media in the decision tree. Asterisks mark proteins identified both by volcano thresholding and machine learning. (K and L) performance of the machine learning protocol estimated by ROC analysis, (J) and confusion matrix in (I). Confusion matrix refers to the percentage of samples assigned to a genotype. Area under the curve in J =0.97 ± 0.12.(M) Venn diagram of the overlap between hits found in conditioned media in panel E and cellular hits in F, p value calculated with exact hypergeometric probability.(N) Shows the % overlap between the cellular proteome ontologies inferred from the datasets shown in (E and F) calculated with the ClueGo application. p value estimated with exact hypergeometric probability Bonferroni corrected.See extended data in Table S1.
The secreted proteome of a MECP2 deficient neuronal cell line
We began using cultured human neurons that were differentiated from the immortalized mesencephalic neuronal cell line LUHMES (Scholz et al., 2011) in which the MECP2 gene was edited by CRISPR/Cas9 (Shah et al., 2016). We reasoned the secreted proteome of a single cell type would define cell autonomous protein candidates for cell-type annotation of hits obtained in CSFs from individuals with Rett syndrome and rodent Mecp2-mutant models. We characterized the culture media before and after cell conditioning (Figure 1B, compare lanes 1 with 2–3). The protein complexity of media alone prevented the identification of proteins contributed by differentiated neurons. The source of protein contaminants was commercial N2 supplements; thus, we customized an N2 supplement starting from high purity reagents. Our customized N2 allowed us to distinguish proteins contributed by either wild type or mutant neurons (Figure 1C, compare lanes 1 with 2–3). TMT mass spectrometry of media alone identified 704 proteins (Figures 1D and 1G). In contrast, cell-conditioned media revealed 958 additional proteins contributed by wild type cells (Figures 1D and 1G). Next, we used this custom media formulation to compare the proteome of wild type and MECP2 null cells. We identified 63 upregulated and 155 downregulated proteins in MECP2 mutant cells by p value and fold-of-change volcano thresholding (Figures 1E and 1H). Prominent downregulated proteins in MECP2 mutant cells included two apolipoproteins (APOC2 and Clusterin, CLU or APOJ), the nucleoporin component AHCTF1, the mitochondrial protein COQ9, as well as factors implicated in citrate cycle and glycolysis such as PDHB and OGDH (Rasala et al., 2006; Rath et al., 2021). To these volcano selected hits, we added 10 additional proteins from a total of 16 proteins whose expression was sensitive to the MECP2 mutation as defined by machine learning (Figure 1J, asterisks for common volcano and machine learning hits) (Torun et al., 2021). Among these 10 new proteins were the mitochondrial proteins TOMM22 and the E2 component of the pyruvate dehydrogenase complex (DLAT) as well as the apolipoprotein APOA2. The performance of the machine learning algorithm was evaluated by receiver operating characteristics (ROC) analysis with an area under the curve of 0.97 (Figure 1K) and confusion matrix analysis where predicted and actual genotypic classes closely matched (Figure 1L).We asked whether the MECP2 secreted proteome was a reflection of the MECP2 cellular proteome. The MECP2 cellular proteome was represented by 187 proteins (Figures 1F and 1I). The overlap between these two MECP2 sensitive proteomes was minor and barely significant (Figure 1M). We found just six common hits, among them SST and IGFBPL1 (Figure 1M). Convergence between these two datasets became evident in few significant ontological categories shared between the MECP2 secreted and cellular proteomes, one of them pyruvate metabolic process, a mitochondrial ontology (GO:0006090 p = 3.96 × 10−5 Bonferroni corrected, Figure 1N). These data suggest that ontologies may be better positioned than isolated proteomic hits to identify convergence between the secreted and cellular proteome in MECP2 gene mutations within a simple cellular system.
Secreted proteomes of MECP2/Mecp2 deficient cerebrospinal fluids in three species
In order to identify CSF proteomes and/or ontologies that are robust and convergent at the intra and inter-species level, we analyzed by TMT mass spectrometry the CSF from wild type and Mecp2 mutants in two rodent species and from Rett syndrome individuals. We performed studies in rats aged 25 days (Figures 2A–2C) and in a large cohort of wild type and Mecp2 null mice aged 6 weeks (Figures 2D–2F). We identified 70 and 64 CSF proteins whose expression was downregulated in mutant rat and mouse CSF, respectively (Figures 2A, 2B, 2D, and 2E). These mutant CSF downregulated proteins prominently converged on subunits of high-density lipoprotein particles such as Apom, Apoa1, Apoh, and Pon1. Similarly, we identified Apoa1, Apoc1, Apoc2, and Apoe as downregulated hits in mouse Mecp2 mutant CSF. An additional category of proteins downregulated in both species were proteins belonging to the complement and coagulation cascades (Figures 2A, 2B, 2D, and 2E). To assess intraspecies robustness, we confirmed these rat CSF hits in an independent cohort of wild type and mutant rats using an orthogonal label-free mass spectrometry quantification procedure (LFQ) (O'Connell et al., 2018). This analysis identified 44 proteins whose expression was affected in mutant CSF (Figure S1A) and confirmed the downregulation of apolipoproteins and complement factors in Mecp2-null CSF (Figures S1A–S1C, Apoa4, Apob, Apoc3, Pon1, and C9). Rodent apolipoprotein and complement cascade hits could not be attributed to blood contamination of the CSF as evidenced by albumin, immunoglobulins, or hemoglobin species, which failed to co-cluster with apolipoproteins and complement (Figures 2B,2E, S1A, and S1B). Synaptic proteins were prominent among factors upregulated in the mouse Mecp2 mutant CSF (Figures 2D and 2E). These synaptic proteins include, but were not limited to, Snap25, Stx1b, Stxbp1, Syn1, and Syn2 (Figures 2D and 2E) (Koopmans et al., 2019). Finally, we also identified mitochondria proteins within both the significant up- and downregulated proteomes in mutant rat and mouse CSF (Figures 2D and 2E).
Figure 2
The Rett syndrome CSF proteome across three species
(A) Volcano plot of TMT mass spectrometry determinations in rat cerebrospinal fluid. Cutoffs at log2 of 0.5-fold of change in protein abundance and a p value <0.05 n = 5.
(B) Shows clustered heat maps of hits selected in A. For A and B, see legend to Figure 1 for additional details.
(C) Depicts volcano plot of rat cortices analyzed by TMT mass spectrometry. Shown are hits selected by p value <0.05 and log2 change of 0.5 n = 5.
(D and E) Show mouse CSF TMT volcano plot and heatmap of selected hits at cutoffs of log2 0.5-fold of change in protein abundance and a p<0.05. n = 16 wild type and 14 Mecp2-null mice.
(F) Analysis of TMT data presented in panel D using an XGBoost machine learning algorithm. Main hits discriminating wild type and Mecp2 mutant CSF proteomes in decision tree are shown. Inserts show performance of the machine learning protocol estimated by ROC analysis and confusion matrix. Area under the curve in J =0.92 ± 0.11. Confusion matrix refers to the percentage of samples assigned to a genotype.
(G) Top Venn diagram shows the overlap between Mecp2-sensitive rat CSF and rat cortex hits using thresholding criteria p<0.05 and log2 fold of change of 0.5. Bottom Venn diagram compares Mecp2-sensitive hits in rat and mouse CSF pooled together with Mecp2-sensitive rat cortex hits.
(H) Depicts the correlation in expression of Mecp2-sensitive hits in rat CSF and rat cortex.
(I and J) Show Rett syndrome female individual CSF TMT volcano plot and heatmap of selected hits at log2 of 0.5-fold of change in protein abundance and a p<0.05 comparing before and after IGF-1 treatment. n = 10 before treatment and 9 after treatment.
(K) Analysis of TMT data presented in panel I using an AdaBoost machine learning algorithm. Main hits discriminating CSF before and after treatment in decision tree are shown. Inserts show performance of the machine learning protocol estimated by ROC analysis and confusion matrix. Area under the curve in insert ROC analysis 0.70 ± 0.26. Confusion matrix refers to the percentage of samples assigned to a genotype.
See Tables S2 and S3.
The Rett syndrome CSF proteome across three species(A) Volcano plot of TMT mass spectrometry determinations in rat cerebrospinal fluid. Cutoffs at log2 of 0.5-fold of change in protein abundance and a p value <0.05 n = 5.(B) Shows clustered heat maps of hits selected in A. For A and B, see legend to Figure 1 for additional details.(C) Depicts volcano plot of rat cortices analyzed by TMT mass spectrometry. Shown are hits selected by p value <0.05 and log2 change of 0.5 n = 5.(D and E) Show mouse CSF TMT volcano plot and heatmap of selected hits at cutoffs of log2 0.5-fold of change in protein abundance and a p<0.05. n = 16 wild type and 14 Mecp2-null mice.(F) Analysis of TMT data presented in panel D using an XGBoost machine learning algorithm. Main hits discriminating wild type and Mecp2 mutant CSF proteomes in decision tree are shown. Inserts show performance of the machine learning protocol estimated by ROC analysis and confusion matrix. Area under the curve in J =0.92 ± 0.11. Confusion matrix refers to the percentage of samples assigned to a genotype.(G) Top Venn diagram shows the overlap between Mecp2-sensitive rat CSF and rat cortex hits using thresholding criteria p<0.05 and log2 fold of change of 0.5. Bottom Venn diagram compares Mecp2-sensitive hits in rat and mouse CSF pooled together with Mecp2-sensitive rat cortex hits.(H) Depicts the correlation in expression of Mecp2-sensitive hits in rat CSF and rat cortex.(I and J) Show Rett syndrome female individual CSF TMT volcano plot and heatmap of selected hits at log2 of 0.5-fold of change in protein abundance and a p<0.05 comparing before and after IGF-1 treatment. n = 10 before treatment and 9 after treatment.(K) Analysis of TMT data presented in panel I using an AdaBoost machine learning algorithm. Main hits discriminating CSF before and after treatment in decision tree are shown. Inserts show performance of the machine learning protocol estimated by ROC analysis and confusion matrix. Area under the curve in insert ROC analysis 0.70 ± 0.26. Confusion matrix refers to the percentage of samples assigned to a genotype.See Tables S2 and S3.We further scrutinized mouse and rat CSF datasets using machine learning algorithms. We sought to identify additional proteins sensitive to Mecp2 deficiency that could otherwise escape detection by volcano thresholding. In addition, we reasoned proteins categorized as apolipoproteins, synaptic, complement-related, or mitochondria-annotated should emerge as priority hits in non-linear decision trees segregating wild type and Mecp2 mutant CSFs. Both mouse and rat CSF machine learning analyses identified complement factors, mitochondrial proteins (Atpaf1, Clybl, Coq9, and Mrpl23), as well as synaptic proteins as priority hits (Actr2, Atp2b2 and Nefh, Figure 2F and S1C–S1E). We validated the performance of machine learning approaches asking their capacity to identify MECP2 as a priority protein hit in a proteome dataset of wild type and Mecp2 mutant rat cortex (Figure S1D). We identified Myg1, a mitochondrial protein (Grover et al., 2019), and MECP2 as the top two most important classifiers to discern between wild type and Mecp2 mutant brain tissue (Figure S1D). These machine learning analyses performed to different degrees as determined by area under the curve in ROC analysis (0.72–0.92, Figures 2F and S1) and/or confusion matrices (Figures 2F and S1). Thus, interrogation of CSF proteome datasets with boosting mathematical algorithms provides similar answers as to protein families enriched in rat and mouse mutant CSFs.The above-described changes to the Mecp2 mutant CSF could closely parallel brain proteome modifications. Alternatively, changes to the CSF proteome could be in proteins different from those in the Mecp2 brain proteome yet both Mecp2 proteomes representing alterations in the same compartment or pathway, a converging ontology. To address this question, we compared the rat Mecp2 mutant cortex proteome of animals where we simultaneously collected CSF. Volcano thresholding by p value and fold of change identified 83 proteins whose expression was modified in Mecp2 mutant cortex among 6,752 proteins quantified by TMT (Figure 2C). None of the Mecp2-sensitive cortical proteins overlapped with rat CSF hits (Figure 2G). Correlation analysis searching for CSF hits in Mecp2 mutant rats that were also quantified in rat cortices found 67 proteins shared between the rat cortical dataset and the rat CSF Mecp2-sensitive proteome. These 67 proteins showed no correlation in their expression levels (Figure 2H). These results show that there is a limited capacity to predict specific protein candidates in Mecp2 mutant CSF from brain proteomes and vice versa.The Mecp2 mutant rodent CSF differs from the human Rett CSF in that the former represents brain tissue homogenously deficient in Mecp2 protein. In contrast, the human CSF proteome reports a genetically mosaic female brain (Lyst and Bird, 2015; Banerjee et al., 2019; Sandweiss et al., 2020). We studied a cohort of individuals with Rett syndrome where CSF was collected as part of a phase I clinical trial (Khwaja et al., 2014). These participants were subjected to extended treatment with recombinant IGF-1 during the trial. CSFs were collected from 12 female individuals (5.7 ± 2.5 years, range 2–10 years, average ± SD (Khwaja et al., 2014)). In seven participants, fluids were collected before and after treatment; in two participants, collections occurred only before treatment; and in three participants, CSF was sampled only after IGF-1 treatment. The phase I clinical trial did not include typically developing control participants because of ethical constraints (Khwaja et al., 2014). Even though IGF-1 treatment did not improve clinical outcomes in Rett subjects (O'Leary et al., 2018), we reasoned that if IGF-1 treatment were to modify some aspects of CSF proteome, it should do so by changing CSF proteins whose expression was Mecp2-sensitive in rodents. In addition, we hypothesized that any IGF-1-induced proteome modifications in individuals with Rett syndrome should be in the opposite direction of what we observed in Mecp2-null rodents and conditioned media from MECP2 null neurons. The Rett CSF proteome revealed a discrete number of proteins sensitive to IGF-1 (Figures 2I–2K). We identified by volcano thresholding 7 proteins whose expression was decreased after IGF-1 treatment and 17 proteins whose expression was increased (Figures 2I–2K). Of importance, the most prominent among the IGF1-upregulated proteins were the high-density lipoprotein proteins APOA1, APOC1 and APOM; a change precisely in the opposite direction of what we found in mouse and rat Mecp2 mutants. Apolipoproteins APOC1, APOC2, and APOM as well as the apolipoprotein regulatory factor PCSK9 were also identified by machine learning (Figure 2L). In fact, APOM was assigned the top priority as a discriminatory factor in a decision tree segregating Rett participants by their IGF-1 treatment (Figure 2K). Our data suggest that discrete CSF proteome changes correlate with IGF-1 treatment in Rett female individuals. In the case of apolipoproteins, protein identity and the direction of change in humans can be informed from the study of Mecp2 mutant CSF proteomes in preclinical animal models.
Composite ontologies of MECP2/Mecp2 mutant cerebrospinal fluids and conditioned media
The secreted proteomes from human cultured neurons, rat, mouse, and human mutant CSFs revealed some common proteins across these diverse experimental systems. These proteins belong to apolipoproteins, complement, or mitochondrial pathways (Figure 3A). We asked if these Mecp2-sensitive hits overlapped just as isolated occurrences or, instead, secreted proteomes obtained from each experimental system sampled a common ontological space. We tested this hypothesis using ClueGO, an application that performs composite and comparative enrichment tests based on hypergeometric distributions (Bindea et al., 2009). To test the robustness of our ontological predictions, we used HumanBase, a genomics data-driven Bayesian machine learning algorithm that identifies functional modules in tissues and cells (Greene et al., 2015). We collated proteins from each of the four Mecp2-and MECP2-deficient experimental systems, selected by volcano thresholding plus machine learning (Figure 1A), to identify a space of shared ontologies (Figure 3A). We simultaneous queried an ontological database composite with ClueGo (GO CC, REACTOME, KEGG and WikiPathways). Each of the four mutant proteome datasets was tagged in ClueGo to discern their individual contribution to each ontology. We identified a space of 87 ontologies significantly represented in all theMecp2-, MECP2-sensitive, and human Rett CSF proteome datasets (Bonferroni corrected p values < 10 × 10−3, Figure 3B). These ontologies revealed a significant enrichment of hits in mitochondrial compartments and pathways, pyruvate and amino acid metabolism, complement subunits, HDL lipoproteins, and synapse-related ontologies (Figure 3B). Of importance, these ontologies were qualified as non-dataset specific, as each mutant proteome dataset contributed less than 50% of hits to each one of these ontologies (Figures 3C and 3D). For example, the HDL particle ontology was made up by 34, 26, 16 and 25% of hits derived from mouse, rat, neuron conditioned media, and human Rett CSF, respectively (Figure 3D, GO:0034364, Bonferroni corrected group p value=2.81E-13). We confirmed these ontological findings using HumanBase, where we identified lipid and cholesterol transport ontologies as the most significantly enriched functional modules (Figure 3E). This outcome was similar if we performed HumanBase analyses either with astrocyte- or neuron-centric queries (Figure 3E). Finally, we confirmed that the Mecp2-, MECP2-sensitive, and human Rett CSF proteomes were enriched in HDL, synaptic, and mitochondria annotated proteins interrogating different databases. We used the curated HDL proteomes database, the SynGo knowledge base of annotated synaptic proteins, and the Mitocarta 3.0 database of annotated mitochondrial proteins (Figures 3F–3H) (Davidson and Shah, 2021; Koopmans et al., 2019; Rath et al., 2021). The collated Mecp2-and MECP2-sensitive proteome contained 80 proteins in common with the HDL proteome (Figure 3F). This represents a significant 7.9-fold enrichment above what is expected by chance (Figure 3F, p < 9.26E-49). Among these overlapping HDL proteins, we found diverse apolipoproteins, complement subunits, antiproteases of the Serpin family, and factors such as clusterin (Clu), and PCSK9. All these overlapping HDL components formed an interconnected network of protein-protein interactions as determined with the Genemania application (Figure 3F) (Montojo et al., 2010). Similarly, the collated Mecp2-, MECP2-sensitive, and human Rett CSF proteome was also significantly either enriched in interconnected synaptic or mitochondrial proteins >2-fold above what is predicted by chance. These findings demonstrate that Rett syndrome CSF proteomes from diverse species converge on a common set of ontologies.
Figure 3
Convergent ontologies inferred from Rett syndrome CSF and CSF-mimic proteomes
(A) Venn diagram depicted hit overlaps among the four experimental systems studied. Hits represent the sum of volcano thresholding- and machine learning-selected hits.
(B and C) Integrated gene ontology analysis of the four datasets in A annotated with the experimental system that originated the dataset. Nodes represent individual ontologies. GO_CellularComponent, KEGG, Reactome, and WikiPathways were queried with the ClueGo application. All ontologies have p<0.001. Exact hypergeometric probability Bonferroni corrected. C, shows the percent contribution of each experimental system to each ontology. Gray denotes ontologies represented by all four experimental systems.
(D) Pie charts of the percent contribution of each experimental system to the top ontologies identified in (B and C), p values hypergeometric probability Bonferroni corrected for the ontology.
(E) Shows ontology analysis using the same datasets in B-C but using the Bayesian engine HumanBase. Analyses were performed either with astrocyte- or neuron-centric queries. Nodes correspond to genes and their functional interrelationships. Nodes are grouped into clusters M(n), prioritized by q value calculated using one-sided Fisher’s exact tests and Benjamini–Hochberg corrections. Most significant cluster is M1.
(F–H) Show Venn diagrams and protein-protein interaction networks of the sum of datasets presented in A overlapping with the curated HDL proteome in F, the Synapse knowledge database of annotated genes in (G), and the Mitocarta 3.0 annotated mitochondrial proteins in (H). Venn diagram p values calculated with exact hypergeometric probability and the representation factor (RP) estimates enrichment beyond what is expected by chance.
See extended ontological data in Table S4.
Convergent ontologies inferred from Rett syndrome CSF and CSF-mimic proteomes(A) Venn diagram depicted hit overlaps among the four experimental systems studied. Hits represent the sum of volcano thresholding- and machine learning-selected hits.(B and C) Integrated gene ontology analysis of the four datasets in A annotated with the experimental system that originated the dataset. Nodes represent individual ontologies. GO_CellularComponent, KEGG, Reactome, and WikiPathways were queried with the ClueGo application. All ontologies have p<0.001. Exact hypergeometric probability Bonferroni corrected. C, shows the percent contribution of each experimental system to each ontology. Gray denotes ontologies represented by all four experimental systems.(D) Pie charts of the percent contribution of each experimental system to the top ontologies identified in (B and C), p values hypergeometric probability Bonferroni corrected for the ontology.(E) Shows ontology analysis using the same datasets in B-C but using the Bayesian engine HumanBase. Analyses were performed either with astrocyte- or neuron-centric queries. Nodes correspond to genes and their functional interrelationships. Nodes are grouped into clusters M(n), prioritized by q value calculated using one-sided Fisher’s exact tests and Benjamini–Hochberg corrections. Most significant cluster is M1.(F–H) Show Venn diagrams and protein-protein interaction networks of the sum of datasets presented in A overlapping with the curated HDL proteome in F, the Synapse knowledge database of annotated genes in (G), and the Mitocarta 3.0 annotated mitochondrial proteins in (H). Venn diagram p values calculated with exact hypergeometric probability and the representation factor (RP) estimates enrichment beyond what is expected by chance.See extended ontological data in Table S4.
Functional validation of Rett syndrome CSF ontologies
We tested the predictive value of the ontologies derived from the collated Mecp2-and MECP2-sensitive CSF proteomes by measuring functional outcomes inferred from these ontologies. Mitochondrial ontologies were significantly enriched across the collated Mecp2-and MECP2-sensitive CSF proteomes (Figures 3B–3E and 3H) suggesting mitochondrial dysfunction in MECP2/Mecp2-deficient cells. Therefore, we measured mitochondrial respiration with Seahorse flow oximetry. We used as a model the post-mitotic human neurons differentiated from the LUHMES cell line (Figure 1), with their MECP2 gene edited by CRISPR-Cas9 (Shah et al., 2016). These MECP2-null cells were infected with a lentivirus expressing the human MECP2 gene from a vector also encoding a Red Fluorescent Protein (TagRFP) under the control of a MECP2 minimal promoter. Controls were performed with MECP2 null cells infected with a virus carrying just the TagRFP reporter (Figure 4). Viral expression of MECP2-RFP restored MECP2 expression in most null cells (>80% infection) as determined by immunoblotting and immunofluorescent microscopy with antibodies against MECP2 or RFP (Figure 4A, compare lanes 2 with 4–6, and Figure 4B). Similarly, most cells infected with a control RFP virus expressed this fluorescent protein (Figure 4B, >80% infection). We used these conditions to measure mitochondrial oxygen consumption (Figure 4C) 72 h after viral infection in two MECP2 null clones (Figure 4D, triangles and circle symbols). Basal, oligomycin-sensitive, and maximal mitochondrial respiration were significantly increased by MECP2-RFP expression as compared to RFP controls. Non-mitochondrial oxygen consumption rates, measured after treatment with rotenone and antimycin to abolish mitochondrial respiration, were not altered by MECP2-RFP expression. These results demonstrate that re-expression of MECP2 increases mitochondrial oxygen consumption in MECP2-null post-mitotic neurons. These mitochondrial respiration phenotypes strengthen the argument that ontologies derived from Mecp2-and MECP2-sensitive CSF proteomes could predict disease biomarkers.
Figure 4
Expression of MECP2 increases mitochondrial respiration in MECP2 null post-mitotic human neurons
Wild type and MECP2 null neurons differentiated from the LUHMES cell line were infected with lentiviruses encoding RFP or MECP2-RFP for 72 h.
(A) presents immunoblot of wild type and MECP2 null cells (lanes 1 and 2) mock infected or infected with increasing amounts of a virus encoding MECP2-RFP (lanes 3-6).
(B) depicts indirect immunofluorescence with RFP antibodies and DAPI for DNA of MECP2 null cells infected with a virus encoding either RFP or MECP2-RFP. Calibration bar 50 μm.
(C and D) Seahorse stress test of cells as in A and B. Arrows a to c indicate the moment of addition of oligomycin to derive ATP dependent respiration (D), FCCP to drive maximal respiration (D), and rotenone plus antimycin to estimate non-mitochondrial respiration (D), respectively. p values were calculated with a two-sided permutation t test. Circles and triangles in (D), represent independent mutant LUHMES clones. (C) and (D) Mean ± Standard Error of the Mean.
Expression of MECP2 increases mitochondrial respiration in MECP2 null post-mitotic human neuronsWild type and MECP2 null neurons differentiated from the LUHMES cell line were infected with lentiviruses encoding RFP or MECP2-RFP for 72 h.(A) presents immunoblot of wild type and MECP2 null cells (lanes 1 and 2) mock infected or infected with increasing amounts of a virus encoding MECP2-RFP (lanes 3-6).(B) depicts indirect immunofluorescence with RFP antibodies and DAPI for DNA of MECP2 null cells infected with a virus encoding either RFP or MECP2-RFP. Calibration bar 50 μm.(C and D) Seahorse stress test of cells as in A and B. Arrows a to c indicate the moment of addition of oligomycin to derive ATP dependent respiration (D), FCCP to drive maximal respiration (D), and rotenone plus antimycin to estimate non-mitochondrial respiration (D), respectively. p values were calculated with a two-sided permutation t test. Circles and triangles in (D), represent independent mutant LUHMES clones. (C) and (D) Mean ± Standard Error of the Mean.
Biochemical and genetic validation of Rett syndrome CSF ontologies
We used convergent Rett ontologies to inform a selection of proteins for confirmatory studies and to assess their potential as disease biomarkers. The HDL lipoprotein proteome was the most significantly enriched ontology among all mutant secreted proteomes (Figures 3B, 3E, and 3F). We performed absolute quantification (AQUA) of proteins by mass spectrometry to confirm expression changes in HDL apolipoproteins in independent cohorts of male Mecp2-null rats (Figure 5A) and mouse CSF from Mecp2-null male mice (Figures 5B and 5C) (Gerber et al., 2003). We extended these AQUA confirmatory studies to CSF from female Mecp2 mice (Figures 5D–5E), a model with genetic validity for the human disease that preponderantly affects females. We selected female Mecp2 mice at six weeks of age to compare to male Mecp2 mice of the same age. These female mice offered a way to measure how gene dosage plus a pre-symptomatic status could affect the differential expression of an analyte in CSF. Mecp2 deficient males have overt symptoms at three weeks of age whereas female animals develop symptoms at ∼3 months (Guy et al., 2001; Schaevitz et al., 2013; Ribeiro and MacDonald, 2020). Finally, we tested CSF proteins differentially expressed in Rett syndrome female human subjects before and after IGF1 treatment by ELISA (Figure 5F) using the experimental design and subject cohort studied by Khwaja et al.Figures 2J–2L) (Khwaja et al., 2014).
Figure 5
Ontologically selected and confirmed Rett syndrome CSF proteome hits
(A and B) shown independent confirmatory analyses using isotopologue peptides mapping the primary sequence of the indicated proteins using AQUA mass spectrometry in rat CSF (A) or a modified AQUA approach in Mecp2 null male mouse CSF (B and C) and in Mecp2heterozygotic female mouse CSF (D and E). In A, femtomoles of the endogenous CSF peptide were normalized to a randomly chosen control sample. In (B–E), the ratio of the CSF endogenous peptide to the isotopologue peptide was used to quantify relative analyte abundance. Gray bars correspond to wild type CSF and blue bars Mecp2 mutant CSF either from male null (B and C) or female heterozygotes (D and E). All analytes were measured independently with 2–3 isotopologue peptides as standards. In A two batches of 5 rats of each genotype were used whereas in (B and C) one batch of 10 mice of each genotype was analyzed, and in (C and D) one batch of 11 females per genotype was used.
(C and E) depict heatmap of all modified AQUA determinations performed in mouse CSF samples selected because they showed significant differences between genotypes in males. Every isotopologue peptide corresponds to a row. Data are depicted as row median divided by the row median absolute deviation both as heatmap and by symbol size.
(F) MesoScale ELISA confirmation of APOA1 levels in Rett individuals CSF before and after IGF-1 treatment.
p values in (A, B, D and F) were calculated with a two-sided permutation t test. See Table S5 for raw data.
Ontologically selected and confirmed Rett syndrome CSF proteome hits(A and B) shown independent confirmatory analyses using isotopologue peptides mapping the primary sequence of the indicated proteins using AQUA mass spectrometry in rat CSF (A) or a modified AQUA approach in Mecp2 null male mouse CSF (B and C) and in Mecp2heterozygotic female mouse CSF (D and E). In A, femtomoles of the endogenous CSF peptide were normalized to a randomly chosen control sample. In (B–E), the ratio of the CSF endogenous peptide to the isotopologue peptide was used to quantify relative analyte abundance. Gray bars correspond to wild type CSF and blue bars Mecp2 mutant CSF either from male null (B and C) or female heterozygotes (D and E). All analytes were measured independently with 2–3 isotopologue peptides as standards. In A two batches of 5 rats of each genotype were used whereas in (B and C) one batch of 10 mice of each genotype was analyzed, and in (C and D) one batch of 11 females per genotype was used.(C and E) depict heatmap of all modified AQUA determinations performed in mouse CSF samples selected because they showed significant differences between genotypes in males. Every isotopologue peptide corresponds to a row. Data are depicted as row median divided by the row median absolute deviation both as heatmap and by symbol size.(F) MesoScale ELISA confirmation of APOA1 levels in Rett individuals CSF before and after IGF-1 treatment.p values in (A, B, D and F) were calculated with a two-sided permutation t test. See Table S5 for raw data.AQUA quantification of the HDL lipoprotein particle components Apoa1, Apob, Pon1, C3, and C9 revealed decreased levels in Mecp2 null males as compared to wild type rat male CSF (Figure 5A). Levels of loading controls, App and A2m, were not affected by genotype in male rats (Figure 5A). These findings were extended to Mecp2-null male mouse models (Figures 5B and 5C). We used isotopolog peptide standards to measure proteins associated with HDL particles in mouse male CSF. Diverse apoliproteins (Apoa1, Apoc1, Apoc2, Apoe), complement subunits (C6, C7, C8a, C8b, C8g, C9), and antiproteases (A1at5, Serpina3k) were reduced in Mecp2 null males compared to controls (Figures 5B and 5C). Similarly, the levels of other secreted proteins such as the neurosecretory protein VGF, proenkephalin-A (Penk), cathepsin Z (Ctsz), or the transmembrane epidermal growth factor receptor (Egfr) were robustly and significantly reduced in the CSF of Mecp2-null male mice (Figure 5C). To address if these analytes are sensitive to the dosage of a Mecp2-null allele and the presence of symptoms, we used AQUA to measure these analytes in CSF from heterozygote Mecp2 female mice at 6 weeks of age. HDL-annotated proteins; such as Apoa1, Apoc1-2, Apoe, C7, C8, and Clu; did not detectably change their abundance in the CSF from heterozygote Mecp2 female mice as measured by AQUA. This was despite their robust change in the CSF of two male Mecp2 rodent models of the same age (Figures 5A–5C). However, the levels of Vgf were significantly reduced in heterozygote Mecp2 female mice CSF (Figures 5D and 5E) to a similar extent found in Mecp2-null male mice.Finally, we focused on the nine human subjects where CSF samples were obtained before and after IGF1 treatment. Treatment modified a Rett CSF composition discretely, mostly driven by changes in the levels of some apolipoproteins such as APOA1 (Figures 2J–2L). We confirmed that IGF1 treatment increased the content of APOA1 in Rett participant’s CSF by ELISA (Figure 5F). In contrast, two proteins whose expression was not modified by treatment in TMT mass spectrometry quantifications, APOE and B2M (Figures 2J–2L), did not change their levels after treatment in ELISA assays (Figure 5C). These results confirm protein hits and ontologies in human and rodent Rett syndrome models.
Ontology selected analytes performance as putative biomarkers of Rett syndrome
To evaluate the potential of ontologically selected Mecp2-sensitive hits to serve as disease biomarkers, we addressed the following questions. First, do CSF Mecp2 hits discriminate between genetic forms of autism spectrum disorder? We selected a null mutation in the Cdkl5 gene, which is causative of the CDKL5 deficiency disorder, an X-linked neurodevelopmental disorder. The behavioral and brain anatomy phenotypes of a mouse model of this syndrome closely mimic those in Rett syndrome mouse models (Zerbi et al., 2021). We found that the CSF proteome of Cdkl5 male mutants was different from Mecp2 male mutant CSF, as indicated by the 15 AQUA confirmed proteins selected from Mecp2 CSF whose levels remained unchanged in Cdkl5 null male CSF. (Figure 6A). There is no significant overlap between Mecp2 and Cdkl5 male mutant CSFs when considering all differentially expressed proteins in CSF from Mecp2-null male rodents as compared to all proteins differentially expressed in the CSF from Cdkl5-null male mice. The later, we identified by TMT mass spectrometry in two cohorts of Cdkl5-null males totaling 20 animals per genotype (Figure 6B, hypergeometric test, and unpublished data). Similarly, there was minimal ontological overlap between Mecp2-null and Cdkl5-null male mice as assessed with the Metascape tool (Figure 6C) (Zhou et al., 2019). These findings demonstrate that Mecp2 CSF differentially expressed proteins discriminate between phenotypically related forms of neurodevelopmental disorders.
Figure 6
Mecp2 and Cdkl5 mutant CSFs are distinct
(A) Heatmap of ontology selected and confirmed analytes in from Figures 5B and 5C tested in the CSF of Cdkl5-null mice. Data correspond to the normalized abundance measured by TMT mass spectrometry in two independent cohorts of wild type and Cdkl5mutant males (total n = 20 per genotype). Data are depicted as row median divided by the row median absolute deviation both as heatmap and by symbol size.
(B) Venn diagram of total overlap between the collated Mecp2-and MECP2-sensitive CSF proteomes (Σ CSFs Mecp2) and all differentially expressed proteins in Cdkl5mutant males as determined by TMT mass spectrometry.
(C) Shared ontologies as defined by the Metascape tool using the datasets in (B).
See extended data in Table S5.
Mecp2 and Cdkl5 mutant CSFs are distinct(A) Heatmap of ontology selected and confirmed analytes in from Figures 5B and 5C tested in the CSF of Cdkl5-null mice. Data correspond to the normalized abundance measured by TMT mass spectrometry in two independent cohorts of wild type and Cdkl5mutant males (total n = 20 per genotype). Data are depicted as row median divided by the row median absolute deviation both as heatmap and by symbol size.(B) Venn diagram of total overlap between the collated Mecp2-and MECP2-sensitive CSF proteomes (Σ CSFs Mecp2) and all differentially expressed proteins in Cdkl5mutant males as determined by TMT mass spectrometry.(C) Shared ontologies as defined by the Metascape tool using the datasets in (B).See extended data in Table S5.Second, we interrogated whether selected Mecp2 CSF proteins were expressed in disease-relevant cell types using publicly available datasets (Figures 7A and 7B) (Scholz et al., 2011; Zhang et al., 2016; Yao et al., 2021). HDL apolipoproteins such as Apoa1, Apoe, Pon1, and Clu are expressed in neurons and astrocytes (Figure 7A). Single cell RNAseq datasets showed that transcripts encoding these HDL proteins were expressed in diverse populations of glutamatergic and GABAergic neurons across the multiple layers of the cortex and hippocampus (Figure 7B). Apoc1 mRNA or its protein were undetectable in neurons and glia but present in plasma and expressed in the choroid (compare Figures 7A and 7B and(Xu et al., 2021)). In contrast, complement components (C7, C8b, C8g, and C9), growth factors (Igf1 and Vgf), mitochondrial proteins (Acat1, Coq9, and Micu1), and synaptic annotated proteins (Snap25, Stx1b, Syn1, and Syn2) were expressed in diverse glial and neuronal cell populations to a different degree (Figure 7B). For example, C8b and C9 mRNAs were expressed in a discrete neuronal population whereas C7 was broadly expressed in cortex and hippocampus (Figure 7B). Thus, ontologically selected protein hits are expressed in disease-relevant neuronal and glial cell types.
Figure 7
Expression patterns of ontology selected analytes in brain cells and plasma
(A) Fold enrichment and rank order of mRNAs most expressed in neurons and astrocytes according to Zhang (Scholz et al., 2011; Zhang et al., 2016) or abundance in the plasma proteome according to Geyer et al. (Geyer et al., 2016). Superimposed are CSF hits. Venn diagrams present overlaps with each cell type gene expression category or plasma proteome.
(B) Depicts a t-SNE cell atlas generated with the expression levels of all transcripts encoding selected hits from the mouse CSF Mecp2-sensitive proteome. The t-SNE atlas encompasses >20 areas of mouse cortex and hippocampus, totaling 76,307 cells (Yao et al., 2021). Color codes denote neuronal subclasses described by Yao et al. (Yao et al., 2021). Neurotransmitter annotation is depicted as well as the expression levels of Mecp2 mRNA across brain regions and cell types. Each atlas depicts the mRNA expression of the indicated analyte. Note analytes such as Apoc1 whose mRNA is not detectable in this dataset. t-SNE cell atlases were assembled using the Allen single-cell RNAseq dataset as describe by Wynne et al. (Wynne et al., 2021).
(C) ROC analysis of selected mouse CSF Mepc2-sensitive hits either quantified by TMT mass spectrometry (gray symbols) or by modified AQUA (blue symbols) in males. Pink symbols depict Mecp2 female mice. Number represents p value that tests the null hypothesis that the area under the curve = 0.50 (non-discriminatory). Blue figures depict area under the curve per each analyte.
See extended data in Table S5.
Expression patterns of ontology selected analytes in brain cells and plasma(A) Fold enrichment and rank order of mRNAs most expressed in neurons and astrocytes according to Zhang (Scholz et al., 2011; Zhang et al., 2016) or abundance in the plasma proteome according to Geyer et al. (Geyer et al., 2016). Superimposed are CSF hits. Venn diagrams present overlaps with each cell type gene expression category or plasma proteome.(B) Depicts a t-SNE cell atlas generated with the expression levels of all transcripts encoding selected hits from the mouse CSF Mecp2-sensitive proteome. The t-SNE atlas encompasses >20 areas of mouse cortex and hippocampus, totaling 76,307 cells (Yao et al., 2021). Color codes denote neuronal subclasses described by Yao et al. (Yao et al., 2021). Neurotransmitter annotation is depicted as well as the expression levels of Mecp2 mRNA across brain regions and cell types. Each atlas depicts the mRNA expression of the indicated analyte. Note analytes such as Apoc1 whose mRNA is not detectable in this dataset. t-SNE cell atlases were assembled using the Allen single-cell RNAseq dataset as describe by Wynne et al. (Wynne et al., 2021).(C) ROC analysis of selected mouse CSF Mepc2-sensitive hits either quantified by TMT mass spectrometry (gray symbols) or by modified AQUA (blue symbols) in males. Pink symbols depict Mecp2 female mice. Number represents p value that tests the null hypothesis that the area under the curve = 0.50 (non-discriminatory). Blue figures depict area under the curve per each analyte.See extended data in Table S5.Finally, we determined if selected and confirmed proteins could distinguish subjects by genotype. We performed ROC analysis focusing on the Mecp2 mutant male mouse CSF hits as the animal cohort of the biggest size, the mouse TMT dataset (Figure 7C). Each ontologically selected analyte efficiently distinguished genotypes irrespective of if they belonged to either HDL lipoprotein, synapse, or mitochondrion category. In fact, all analytes ROC area under the curve were between 0.77 and 0.9 with significant p values (Figure 7C). In other words, these analytes have a 77 to 90% chance to distinguish wild type and mutant CSF. The ROC performance of an analyte was similar whether TMT or AQUA male datasets were analyzed (Figure 7C, compare gray and blue symbols). A similar ROC performance was obtained with Vgf AQUA measurements in heterozygote Mecp2 female mouse CSF samples (Figure 7C, pink symbols). None of these validated analytes experienced modifications in their mRNA expression across diverse brain regions in Mecp2 mutant male mice, indicating that the utility of these analytes as putative biomarkers of Mecp2 gene defect is restricted to their protein levels in CSF (Figure S2, Table S6). These findings demonstrate that CSF analytes identified in rodent models of Rett syndrome are sensitive, conserved, and represent specific candidates for Rett biomarkers with potential for human applications.
Discussion
Here we demonstrate that an X-linked neurodevelopmental disorder, Rett syndrome, reproducibly and distinctively impacts the composition of the CSF. We defined a consensus proteome and ontological categories shared by four experimental systems across three species deficient in Mecp2/MECP2. These proteomes converged on proteins annotated to HDL lipoproteins, complement cascade, mitochondrial compartments, citrate cycle/pyruvate metabolism, synapse compartments, and secreted factors such as Vgf. The robustness of our findings is founded on the multipronged nature of our experimental design, which includes diversified in vitro and in vivo systems, multiple species studied, distinct mathematical processing of datasets capturing similarly annotated proteins, replication across different proteomic platforms (LFQ and TMT), and replicability across three sites for CSF collection and two sites of mass spectrometry analysis (Figure 1C). Although, the different mutant CSF proteomes produced discrete overlap across individual analytes, they all shared significant overlap at the ontology level. We functionally confirmed mitochondrial compartment-related ontologies, which were bioinformatically predicted, by discovering mitochondrial respiration sensitive to MECP2 expression in human neurons (Figure 4). We used these convergent ontologies to inform the selection of analytes for orthogonal confirmatory efforts. These confirmatory approaches independently validated our LFQ and TMT findings. Confirmed analytes provided additional proof of principle to the use of convergent ontologies as a strategy to select analytes suitable to discriminate a mutant genotype across species and quantification platforms. For example, even though Apoa1 was not identified as a significant hit in the mouse CSF TMT proteome, selection of Apoa1, based on the ontology to which it belongs, predicted and resulted in robust confirmation across all species studied and platforms used. Ontologically selected hits performed well as putative biomarkers as determined by ROC analysis and the capacity of multiple Mecp2-sensitive hits to discriminate Mecp2 mutant CSF from another phenotypically related neurodevelopmental disorder, the CDKL5 deficiency disorder in male preclinical disease models. Of the analytes identified in our studies, we think the expression of APOA1 and VGF are likely the best candidates for CSF biomarkers of disease. These two proteins were modified in female Rett cases or female heterozygotic mouse either by genotype or IGF-1 treatment. We propose that Mecp2 mutant CSF ontologies inform robust CSF analytes to act as Rett syndrome biomarkers in humans. This contention awaits confirmation in human CSF from controls and Rett cases that, until recently, were not available (Zandl-Lang et al., 2022).The mechanisms that account for the changes in the secreted proteome described here have not been explored yet. However, we reasoned that if the Mecp2 secreted proteome were to be caused directly by a Mecp2-dependent transcriptional defect, there should be parallel modifications in the cellular and secreted proteomes. We found that this is not the case. The CSF and brain cortex proteomes did not correlate. In fact, transcriptomic analysis of several ontologically selected CSF protein hits showed that none of these proteins exhibited correlated modifications in their mRNA levels in brain. Similarly, the neuronal cell and conditioned media proteomes poorly overlapped. These results argue that indirect mechanisms downstream of Mecp2-dependent transcription, such as network activity, likely drive the secreted proteome phenotypes.We have minimized the possibility that accidental plasma contamination of the CSF is a driving factor for some of the CSF expression differences observed. However, CSF protein composition is defined by factors that normally transcytose from the plasma to the CSF plus contributions from neuronal and non-neuronal cells in the brain parenchyma, the choroid plexus, and ependymal cells (Tsujita et al., 2021; Stukas et al., 2014). Therefore, our findings likely represent contributions of diverse cell types in brain to the Mecp2-sensitive CSF proteome. With few exceptions, many of our CSF Mecp2-sensitive proteins could be attributed to multiple cell types. For example, Apoe and Clu (Apoj) could be ascribed to secretions from astrocytes or the choroid plexus, where Apoe and Clu (Apoj) rank among the most expressed mRNAs; they could be ascribed to neurons, where these mRNAs are also expressed yet at lower levels. Of importance, Apoe brain levels are locally controlled without contributions from plasma (Linton et al., 1991). We directly tested the hypothesis that the expression of apolipoproteins is cell-autonomously controlled in neurons as demonstrated by the reduced levels of Clu (Apoj) in the conditioned media of human post-mitotic neurons. On the other extreme, Vgf and Igf1 mRNAs are expressed in neurons with preferences for neuronal cell types. Such is the case of Igf1, which is mostly expressed in GABAergic interneurons and is minimally or not expressed in glia, endothelial cells, and the choroid plexus (Zhang et al., 2014; Xu et al., 2021). Thus, the Mecp2 secreted proteome offers multiple analytes to assess phenotypes in multiple brain cell types lacking Mecp2.Vgf is uniquely reduced both in Mecp2 male and Mecp2 female mouse models among the analytes identified in CSF as responsive to the Mecp2 gene defects. In contrast, proteins annotated to HDL ontologies did not change their levels in Mecp2 female CSF whereas they changed in Mecp2 male CSF. A possible interpretation for these findings is that the levels of diverse analytes differ in their dependency on gene dosage and whether mice are symptomatic or not. However, an alternative interpretation is that technical limitations measuring Vgf and HDL-annotated protein preclude determinations of these analytes that capture genotype- and symptomology-dependency. For example, the coefficient of variation for Vgf measurements was the lowest, both in male and female of all genotypes (13-38%). However, coefficient of variation for HDL-annotated protein measurements ranged between 93 and 201%. This variability would prevent the detection of subtle changes in analyte levels driven by gene dosage modifications. Thus, a more conservative interpretation of our results is that the female Mecp2 genotype reduces the magnitude of the differences of some CSF analytes below the threshold of detection.All Mecp2 secreted proteomes converged on robust ontologies. Proteins annotated most significantly to HDL lipoprotein, complement, synapse, mitochondria, and mitochondrial pathways such as citrate cycle/pyruvate metabolism ontologies. These consensus ontologies likely point to pathogenic mechanisms in Rett syndrome. For example, the effects of Mecp2 mutations on synaptic morphology, function, and plasticity have been extensively documented (Na et al., 2012; Banerjee et al., 2019). However, HDL and mitochondrial ontologies have received less attention. HDL particles are assembled by astrocytes and microglia. These lipoproteins transport cholesterol between glial cells and neurons. Thus, a possible mechanism to account for the decreased levels of HDL apolipoproteins in CSF is either a decreased production/secretion by Mecp2 deficient glial cells or an increased clearance by cells that express HDL receptors in brain, such as neurons (Wellington and Frikke-Schmidt, 2016; Liu et al., 2010). We favor the decreased HDL production model as it can explain the observed increased cholesterol content in brain at postnatal day 56, despite decreased expression of cholesterol synthesis enzymes and decreased de novo cholesterol synthesis, as well as the decreased cholesterol content in the CSF of Rett syndrome cases (Buchovecky et al., 2013; Zandl-Lang et al., 2022). We postulate that decreased HDL lipoproteins levels in CSF may be a factor contributing to the accumulation of cholesterol in brain and the concurrent inhibition by product of cholesterol synthesis. A second ontology strongly represented in our datasets is mitochondria compartments and pathways, which we functionally confirmed by measuring mitochondrial respiration (Figure 4). Pyruvate and lactate are increased in the CSF of individuals with Rett syndrome, and Krebs cycle metabolites are increased in the brain of Mecp2 mutant mice. This suggests connections between CSF glycolysis and Krebs cycle ontologies and proteins and these metabolites (Budden et al., 1990; Matsuishi et al., 1994). However, there are not enough studies to tie together our observations in CSF with potential models of mitochondrial dysfunction in Mecp2 mutant cells (Kriaucionis et al., 2006; Shulyakova et al., 2017; Jagtap et al., 2019; Hirofuji et al., 2018; Grosser et al., 2012). Our findings support the idea that Mecp2 mutant CSF ontologies predict putative brain mechanisms disrupted by mutations in Mecp2. We propose that Rett syndrome is a synaptic and metabolic disorder of neurodevelopment.
Limitations of the study
The invasive nature of CSF collection in humans, in particular children, is a severe limitation precluding the acquisition of CSF samples from neurotypic and diseased subjects. Thus, our human studies rely on an already studied cohort of only Rett subjects, before and after a treatment was applied to them. We did not compare to a neurotypic control group. To circumvent this limitation, we devised a strategy involving CSF-mimic fluid from cultured human neurons and CSF from two rodent models of the disease to build a robust portfolio of CSF analytes and ontologies strongly and consistently associated with Rett syndrome. The potential of these analytes and ontologies to act as biomarkers of disease awaits careful testing in humans and non-human models of the disease undergoing genetic or pharmacological therapies.
STAR★Methods
Key resources table
Resource availability
Lead contact
Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Victor Faundez (vfaunde@emory.edu).
Materials availability
This study did not generate new unique reagents.
Experimental models and subject details
Human samples
Clinical features of the cohort used in these studies are described by Khwaja et al. (2014). The referred to study was approved by the Institutional Review Board of Boston Children’s Hospital and informed consent was obtained from the parent of each participant. CSF samples were received and remained deidentified for these studies.
Animal models
All rat experiments were carried out in accordance with the European Communities Council Directive (86/609/EEC) and with the terms of a project license under the UK Scientific Procedures Act (1986). The Mecp2 rats were maintained by crossing Mecp2 females (SD-Mecp2) with wild type Sprague Dawley males. Animals were maintained on 12-h light/dark cycles with free access to normal rat food and water. WT and Mecp2 rats at 25 days of postnatal age were weighed and assessed for the development of the RTT-like phenotypes prior to surgery.Mouse animal husbandry and euthanasia was carried out as approved by the Emory University Institutional Animal Care and Use Committees. Male and female mice of C57BL/6J, Mecp2 deficient (Mecp2), and Cdkl5-deficient (Cdkl5) were obtained from the The Jackson Laboratory stocks (The Jackson Laboratory #000664, #003890 and #021967, respectively). All animals were of 6 weeks of age. Animals were maintained on 12-h light/dark cycles with free access to mouse chow and water.
Cell lines
Female LUHMES wild-type control clone, and MECP2 knock-out clone 2_7 cell lines were cultured and differentiated on Nunclon flasks and plates treated with a 44 μg/mL Poly-L-Ornithine (Sigma P3655) and 1 μg/mL fibronectin (Sigma F1141) solution overnight in a 37°C incubator. LUHMES cells were differentiated as follows: three million cells were plated in a T75 flask with proliferation media (Advanced DMEM/F12 (Gibco 12634-010) with N2 (Gibco 17502048), 2 mM L-glutamine (Sigma G7513), and 40 ng/mL beta-FGF (R&D Systems 4114-TC-01M). After 24 h, media was changed to differentiation media (Advanced DMEM/F12 with N2, 2 mM L-glutamine, 1 mM DbcAMP (Sigma D0627), 1 μg/mL tetracycline (Sigma T7660), and 2 ng/mL GDNF (R&D Systems 212-GD-050) for a pre-differentiation phase of two days. Pre-differentiated cells were lifted with trypsin method. Trypsin activity was blocked with aprotinin after lifting the cells.
Method details
Rat CSF sample collection
Rats were anesthetized using intraperitoneal administration of an injectable cocktail of medetomidine (0.5 mg/kg) and ketamine (75 mg/kg). Once the animal was deeply anesthetized, as indicated by the absence of withdrawal reflexes (tail and limbs) and the eye positioning reflex, the surgical area was shaved and the animal was secured in the stereotaxic frame with the head tilted at roughly 45°. The surgical area was then cleaned with Hibiscrub and a surgical drape was placed around the operating area with a hole to expose only the surgical area. A skin incision along the midline of the skull extending from between the eyes to 3-4 cm caudally to make sure the back of the neck is fully exposed. The fascia and the superficial and deep layers of the neck muscles were then dissected to expose the membrane of the dura mater at the atlanto-occipital joint between the occipital condyles and the rostral facets of atlas. The cisterna magna was then carefully pierced by a pulled glass pipette (1 cm long) connected to a 2.5 mL syringe through 30 cm of PE-50 tubing. A small volume of CSF entered the glass pipette through the capillary action and the flow was maintained by gently pulling the plunger. The CSF was collected into cryoprotective tubes and snap-frozen immediately in liquid nitrogen. Animals were then given a lethal dose of anesthesia, decapitated, and the brain was exposed and the areas of interest were dissected and snap-frozen in liquid nitrogen.
Mouse CSF collection
Our terminal CSF collection method was adapted from a previously published protocol (Boire et al., 2017). Mice were deeply anesthetized by intraperitoneal injection of a mixture of ketamine (73.5 mg/kg; Akron, USA), xylazine (9.2 mg/kg; Bayer Pharma, Germany), and acepromazine maleate (2.75 mg/kg; Boehringer Ingelheim, USA) in 0.9% (v/w) NaCl. The back of the neck overlying the occiput were first shaved then cleaned and disinfected with 70% ethanol. Using the thumb and index finger, the mouse was placed prone with the neck in flexion on a 15 mL conical tube at an approximately 45-degree angle to access the cisterna magna using landmarks between occipital protuberances and the spine of the atlas. A Hamilton syringe containing 30 G needle was inserted through the skin at a 45-degree angle with the horizontal, to reach a depth of approximately 4 mm into the cisterna magna for CSF collection without need for an incision. The syringe was kept stable without any lateral movement and 4–12 μL of clear CSF was drawn into the syringe by slow and smooth aspiration. The CSF was immediately spun down for 30 s and clear CSF was inspected with the naked eye and frozen immediately on dry ice. Frankly blood contaminated samples discarded.
LUHMES conditioned media preparation
LUHMES wild-type control, and MECP2 knock-out clone 2_7 cell lines were differentiated and conditioned media was collected. To reduce background signal in mass spectrometry, the last phase of differentiation utilized high purity and BSA-free components including high purity N2 components. A 100x high purity N2 solution was made with 10 mg/mL human holo-transferrin (Sigma T4132), 0.5mg/mL human recombinant insulin solution (Sigma I9278), 0.63 μg/mL progesterone (Sigma P6149), 1.61 mg/mL putrescine dihydrochloride (Sigma P5780), 0.52 μg/mL sodium selenite (Sigma S5261) and DMEM/F12 (Thermo Fisher 21,331,020). One million pre-differentiated cells were plated to each well of a Nunclon 6-well dish with 2 mL of the high purity differentiation media: DMEM/F12 (Thermo Fisher 21331020) containing high purity N2 (above), 2 mM L-glutamine (Sigma G7513), 1 mM DbcAMP (D0627), 1 μg/mL tetracycline (T7660), and 2 ng/mL GDNF (R&D Systems 212-GD-050). Cells conditioned the media for 3 days at 5% CO2 in a 37°C incubator. On the third day, the conditioned media was collected and Complete antiprotease (Roche 11697498001) was added. Cellular debris was pelleted at 16,000 x g in an Eppendorf microcentrifuge at 4°C for 20 min. The supernatant was collected and flash frozen on dry ice. A trichloroacetic acid (TCA) precipitation was done on 750 μL of the conditioned cell media by adding 9.8 μg sodium deoxycholate per 100 μL of conditioned media followed by trichloroacetic acid to 10%. The solution was incubated on ice for 20 min to precipitate out proteins. The solution was centrifuged at 16,000 x g for 15 min at 4°C. TCA supernatant was aspirated out and the pellet was washed in an equal volume of ice-cold acetone and vortexed. Precipitate was repelleted by centrifugation at 16,000 x g at 4°C for 10 min. Acetone was aspirated and the pellet was lightly air-dried, dissolved in 200 μL of 8 M Urea, and flash frozen on dry ice.
LUHMES cell viral transduction
LUHMES cells were cultured as above and infected with 0–16.5 μl/mL titer with pQS136-03(fMECP2-MECP2_P2A-nucTagRFP) and pQS140-04(fhSyn-H2B(nls)-TagRFP). Virus was added during media change to pre-differentiation media. Cells were incubated in virus containing media for 48 (immunofluorescence and flux oximetry) or 72 h (lysis) and then plated onto coated coverslips for immunofluorescence, to Seahorse Flux Oximetry 96 cellular well plates for differentiation, or to culture flasks for lysis.
Lysis
For cellular lysis, cells were washed in PBS containing MgCl2 1 mM, CaCl2 0.1 mM and lysed in 8 M Urea containing complete antiprotease. Lysis was incubated for 30 min and then sonicated with 10 quick bursts to shear DNA. Following Bradford Assay for protein concentration, lysis was prepped for running an SDS-Page electrophoresis gel and transferred to PVDF membrane. Membranes were blocked in 5% non-fat milk in TBS containing 0.05% Triton X-100. Primary antibodies were incubated at 4°C overnight (Anti-MECP2, Invitrogen PA-1-888, 1:500 rabbit polyclonal and anti-beta actin Sigma A5451 1:1000 mouse monoclonal). Secondary antibodies were Thermo HRP conjugated anti-rabbit and anti-mouse (Thermo G21234 and A10668).
Immunofluorescence
For immunofluorescence, cells were washed in PBS containing MgCl2 1 mM, CaCl2 0.1 mM, fixed in 4% paraformaldehyde for 20 min on ice, and rinsed again in PBS containing MgCl2 1 mM, CaCl2 0.1 mM. Cells were permeabilized in 0.2% Triton X-100 in PBS for 5 min and blocked in 2% BSA, 1% fish gelatin, and 15% horse serum all in PBS for 30 min at room temperature. After blocking, coverslips were incubated for 30 min at 37°C in primary antibody diluted in block solution (anti-dsRed, 1:500, Clontech catalog #632496), washed 3 times quickly in blocking solution and incubated for 30 min at 37°C with secondary antibody diluted in block solution (anti-rabbit AlexaFluor 555, 1:1000, Thermo A21429). Coverslips were washed 3 times in blocking solution, and once in PBS containing MgCl 1 mM, CaCl 0.1 mM followed by mounting with DAPI Fluoromount-G (Southern Biotech 0100-20). Images were captured on a widefield fluorescence laser scan confocal Ti2 microscope with Nikon A1 LFOV camera, objective Apo 60× oil λS DIC N2/NA 1.4. with Galvano scanner. DAPI acquisition was with PMT detector (EmW: 450.0, ExW: 405.0) and Alexa Fluor 555 was acquired with GaAsP detection (EmW:595.0, ExW: 561.0).
Seahorse flux oximetry
For Flux Oximetry, assays were performed using manufacturer’s guidelines on the Seahorse XFe96 analyzer. Each well contained approximately 20,000 differentiated LUHMES cells. Assay media consisted of DMEM based assay media supplemented with 2 mM L-glutamine, 1 mM sodium pyruvate, 10 mM glucose. Stress test drug injection strategies resulted in final well concentrations of 1mM oligomycin, 0.25 μM FCCP, and 0.5 μM each of rotenone and antimycin A. Assay oxygen and pH recording parameters were based on the preset Agilent Seahorse Stress Test protocol in Wave 2.6.1.
Mass spectrometry Emory
Sample processing
All CSF (5 μL) samples were diluted with 50 μL of 50 mM NH4HCO3 and treated with TCEP and CAA and heated at 90°C for 10 min. The samples were digested with 1:20 (w/w) lysyl endopeptidase (Wako) at 25°C overnight. Further overnight digestion was carried out with 1:20 (w/w) trypsin (Promega) at 25°C. Resulting peptides were desalted with an HLB microelution plate (Waters, Cat# 186001828BA) and dried under vacuum.
Tandem mass tag (TMT) labeling
For each sample, labeling was performed as previously described (Higginbotham et al., 2020; Ping et al., 2018). Briefly, each was re-suspended in 100 mM TEAB buffer (100 μL). The TMT and TMTPro labeling reagents (ThermoFisher Cat# A34808, A44520) were equilibrated to room temperature, and anhydrous ACN (256μL) was added to each reagent channel. Each channel was gently vortexed for 5 min, and then 41 μL from each TMT channel was transferred to the peptide solutions and allowed to incubate for 1h at room temperature. The reaction was quenched with 5% (v/v) hydroxylamine (8 μL) (Pierce). All channels were then combined and dried by SpeedVac (LabConco) to approximately 150 μL and diluted with 1 mL of 0.1% (v/v) TFA, then acidified to a final concentration of 1% (v/v) FA and 0.1% (v/v) TFA. Peptides were desalted with a 30 mg C18 Sep-Pak column (Waters, Cat# 186008055). Each Sep-Pak column was activated with 1mL of methanol, washed with 1 mL of 50% (v/v) ACN, and equilibrated with 2 × 1 mL of 0.1% TFA. The samples were then loaded and each column was washed with 2 × 1 mL 0.1% (v/v) TFA, followed by 1 mL of 1% (v/v) FA. Elution was performed with 2 volumes of 0.5 mL 50% (v/v) ACN. The eluates were then dried to completeness.
High pH fractionation
High pH fractionation was performed essentially as described (Ping et al., 2020; Gokhale et al., 2020) with slight modification. Dried samples were re-suspended in high pH loading buffer (0.07% v/v NH4OH, 0.045% v/v FA, 2% v/v ACN) and loaded onto BEH C18 column (2.1× 150 mm with 1.7 μm beads) (Waters, Cat# 186002353). A Thermo Vanquish system was used to carry out the fractionation. Solvent A consisted of 0.0175% (v/v) NH4OH, 0.01125% (v/v) FA, and 2% (v/v) ACN; solvent B consisted of 0.0175% (v/v) NH4OH, 0.01125% (v/v) FA, and 90% (v/v) ACN. The sample elution was performed over a 22 min gradient with a flow rate of 0.6 mL/min from 0 to 50% solvent B. A total of 96 individual equal volume fractions were collected across the gradient and subsequently pooled by concatenation into 48 fractions for the Mecp2 null TMT batches. For the Cdlk5 null batch, 192 fractions were collected and combined into 96 fractions. All fractions were dried to completeness using a vacuum centrifugation.
Liquid chromatography tandem mass spec for TMT
Each of the peptide fractions was re-suspended in loading buffer (0.1% FA, 0.03% TFA, 1% ACN). Peptide eluents were either separated on a self-packed C18 (1.9 μm) (Dr. Maisch, Germany, Cat# ReproSil-Pur: 120C18-AQ 1.9 um 1 g), fused silica column (15 cm × 100 μM internal diameter) (New Objective, Cat# FS360-100-15-N-5-C25) or a 1.7 μm CSH C18 column (15 cm × 150 μM internal diameter) (Waters, Cat# 186008814). An Easy nLC 1200 (Thermo Fisher Scientific) or Ultimate U300 RSLCnano (Thermo Scientific) was used to elute the peptide ion. Mass spectra were collected either on a Fusion Lumos or Fusion Eclipse mass spectrometer. Both mass spectrometers were outfitted with the FAIMS Pro ion mobility source.
Liquid chromatography tandem mass spec for PRM
AQUA standard peptides (Thermo Fisher Scientific) were spiked into digested mouse CSF samples. For each sample and equivalent of 1 μl of CSF was loaded onto a Water’s 1.7 μm CSH C18 column (15 cm × 150 μM internal diameter). Peptides were eluted using an Ultimate 3000 RSLCnano and PRM spectra were collected using an Orbitrap HFX mass spectrometer.
Data processing protocol
All TMT raw files were searched using Thermo’s Proteome Discoverer suite (version 2.4.1) with Sequest HT. The spectra were searched against rat or mouse uniprot database. Search parameters included 20 ppm precursor mass window, 0.05 Da product mass window, dynamic modifications methione (+15.995 Da), deamidated asparagine and glutamine (+0.984 Da), phosphorylated serine, threonine and tyrosine (+79.966 Da), and static modifications for carbamidomethyl cysteines (+57.021 Da) and N-terminal and Lysine-tagged TMT (+229.26340 Da or +304.207 Da). Percolator was used filter PSMs to 0.1%. Peptides were grouped using strict parsimony and only razor and unique peptides were used for protein level quantitation. Reporter ions were quantified from MS2 scans using an integration tolerance of 20 ppm with the most confident centroid setting. Only unique and razor (i.e., parsimonious) peptides were considered for quantification. PRM spectra were processed using the Skyline quantitation suite (MacLean et al., 2010).
Mass spectrometry Ann Arbor
Sample preparation
For tissues, samples were washed twice in 1X PBS and lysed in 8 M urea, 50 mM Tris HCl, pH 8.0, 1X Roche Complete Protease Inhibitor and 1X Roche PhosStop. Other samples were processed directly for protein quantification using Qubit fluorometry following by digestion overnight with trypsin. Briefly, samples were reduced for 1 h at RT in 12 mM DTT followed by alkylation for 1 h at RT in 15 mM iodoacetamide. Trypsin was added to an enzyme:substrate ratio of 1:20. Each sample was acidified in formic acid and subjected to SPE on an Empore SD C18 plate. For TMT labeling, after trypsin digestion ach sample was acidified in formic acid and subjected to SPE on an Empore SD C18 plate (3M catalog# 6015 SD). Each sample was lyophilized and reconstituted in 140 mM HEPES, pH 8.0, 30% acetonitrile.
Label-free quantification mass spectrometry
A 2 μg aliquot was analyzed by nano LC/MS/MS with a Waters NanoAcquity HPLC system interfaced to a Thermo Fisher Fusion Lumos. Peptides were loaded on a trapping column and eluted over a 75 μm analytical column at 350 nL/min; both columns were packed with Luna C18 resin (Phenomenex). A 4-h gradient was employed. The mass spectrometer was operated in data-dependent mode, with MS and MS/MS performed in the Orbitrap at 60,000 FWHM resolution and 15,000 FWHM resolution, respectively. APD was turned on. The instrument was run with a 3-s cycle for MS and MS/MS. The acquisition order was randomized. Data Processing Data were processed through the MaxQuant software v1.6.2.3 (www.maxquant.org). Data were searched using Andromeda with the following parameters: Enzyme: Trypsin, Database: Uniprot Rat, Fixed modification: Carbamidomethyl (C), Variable modifications: Oxidation (M), Acetyl (Protein N-term), Fragment Mass Tolerance: 20 ppm Pertinent. MaxQuant settings were: Peptide FDR 0.01 Protein FDR 0.01 Min. peptide Length 7 Min. razor + unique peptides 1 Min. unique peptides 0 Min. ratio count for LFQ 1 Second Peptidesˆ TRUE Match Between Runs∗ TRUE.
TMT quantification mass spectrometry
40 μL of acetonitrile was added to each TMT tag tube and mixed aggressively. Tags were incubated at RT for 15 min 30 μL of label was added to each peptide sample and mixed aggressively. Samples were incubated in an Eppendorf Thermomixer at 300 rpm 25°C for 1.5h. Reactions were terminated with the addition of 8 μL of fresh 5% hydroxylamine solution and 15-min incubation at room temperature. Samples were subjected to high pH reverse phase fractionation as follows; Buffers: Buffer A: 10 mM NaOH, pH 10.5, in water Buffer B: 10 mM NaOH, pH 10.5, in acetonitrile. We used XBridge C18 columns, 2.1 mm ID x 150 mm length, 3.5 μm particle size (Waters, part #186003023) attached to a Agilent 1100 HPLC system equipped with a 150 μL sample loop operating at 0.3 mL/min, detector set at 214 nm wavelength. Dried peptides were resolubilized in 150 μL of Buffer A and injected manually. Fractions were collected every 30 s from 1 to 49 min (96 fractions total, 150μL/fraction). We analyzed by mass spectrometry 10% per pool for the full proteome in a nano LC/MS/MS with a Waters NanoAcquity HPLC system interfaced to a ThermoFisher Fusion Lumos mass spectrometer. Peptides were loaded on a trapping column and eluted over a 75 μm analytical column at 350 nL/min; both columns were packed with Luna C18 resin (Phenomenex). Each high pH RP fraction was separated over a 2-h gradient (24 h instrument time total). The mass spectrometer was operated in data-dependent mode, with MS and MS/MS performed in the Orbitrap at 60,000 FWHM resolution and 50,000 FWHM resolution, respectively. A 3-s cycle time was employed for all steps. Data Processing Data were processed through the MaxQuant software v1.6.2.3 (www.maxquant.org). Data were searched using Andromeda with the following parameters: Enzyme: Trypsin Database: Uniprot Rat, Fixed modification: Carbamidomethyl (C) Variable modifications: Oxidation (M), Acetyl (Protein N-term), Phopho (STY; PO4 data only). Fragment Mass Tolerance: 20 ppm. Pertinent MaxQuant settings were: Peptide FDR 0.01 Protein FDR 0.01 Min. peptide Length 7 Min. razor + unique peptides 1 Min. unique peptides 0 Second Peptides FALSE Match Between Runs FALSE The protein Groups.txt files were uploaded to Perseus v1.5.5.3 for data processing and analysis.
AQUA mass spectrometry
Synthetic peptides labeled with Arginine (13C6,15N4) or Lysine (13C6,15N2) at >95% purity were made by New England Peptide MA 01440 USA. The following peptides were used: Myh9 AGVLAHLEEER; IAQLEEQLDNETK. Pon1 IFFYDSENPPGSEVLR; LLIGTVFHR. App TEEISEVK; THTHIVIPYR. A2m AIAYLNTGYQR; LPSDVVEESAR. Apoa1 DYVSQFESSTLGK; WNEEVEAYR. Apob TEVIPPLIENR; GFEPTLEALFGK. C3 GLEVSITAR; SSVAVPYVIVPLK. C9 SIEVFGQFQGK; TTSFNANLALK. Thbs1 FVFGTTPEDILR; IENANLIPPVPDDK. A 3-4 μg aliquot of each CSF tryptic peptide digests was spiked with isotopologe peptides at a concentration of 100 or 133 fmol/μg peptide digest. Peptides mixes were analyzed in analytical duplicate by nano LC/PRM using a Waters NanoAcquity HPLC system interfaced to a Thermo Fisher Fusion Lumos mass spectrometer. 1.5 μg per sample was loaded on a trapping column and eluted over a 75 μm analytical column at 350 nL/min; both columns were packed with Luna C18 resin (Phenomenex). A 1-h gradient was employed. The mass spectrometer was operated in PRM mode without scheduling; instrument settings included 15,000 FWHM resolution, NCE 30, AGC target value 5 × 104, and maximum IT of 22 ms. Data were processed using Skyline v4.2.
Data processing
Proteomics data were log2 converted. Data analysis was performed with two methods. We used Qlucore Omics Explorer Version 3.6(33) normalizing data to a mean of 0 and a variance of 1. No filtering by standard deviation was applied. All data were thresholded by a log2 fold of change of 0.5 and a non-corrected p value of 0.05.A second method used was OmicLearn (v1.0.0) for performing the data analysis, model execution, and generating the plots and charts (Torun et al., 2021). Machine learning was done in Python (3.8.8). Feature tables were imported via the Pandas package (1.0.1) and manipulated using the Numpy package (1.18.1). The machine learning pipeline was employed using the scikit-learn package (0.22.1). For generating the plots and charts, Plotly (4.9.0) library was used. No normalization on the data was performed. To impute missing values, a mean-imputation strategy is used. Features were selected using a Extra-Trees (n_trees = 100) strategy with the maximum number of 20 features. Normalization and feature selection was individually performed using the training data of each split. For classification, we used either a XGBoost-Classifier (random_state = 23 learning_rate = 0.3 min_split_loss = 0 max_depth = 6 min_child_weight = 1), a AdaBoost-Classifier (random_state = 23 n_estimators = 100 learning_rate = 1.0), or RandomForest-Classifier (random_state = 23 n_estimators = 100 criterion = gini max_features = auto). Clasifiers were chosen based on the proximity of ROC curves to a value of 1. When using (RepeatedStratifiedKFold) a repeated (n_repeats = 10), stratified cross-validation (n_splits = 5) approach to classify datasets based on their genotype.
RNAseq and single cell RNAseq
These procedures were reported before (Wynne et al., 2021). RNA, library construction, and sequencing were performed by BGI. Total RNA was isolated with Trizol and quality control was done with the Agilent 2100 Bio analyzer (Agilent RNA 6000 Nano Kit, Cat# 5067-1511) to do total RNA sample QC: RNA concentration, RIN value 28S/18S and the fragment length distribution.For library generation, poly-A containing mRNAs were isolated by poly-T oligo-attached magnetic beads. mRNA was fragmented into pieces using divalent cations under elevated temperature. RNA fragments were copied into first strand cDNA with reverse transcriptase plus random primers. Second strand cDNA synthesis was done using DNA Polymerase I and RNase H. cDNA fragments underwent addition of a single ‘A’ base followed by ligation of the adapter. The products were isolated and enriched by PCR amplification. We quantified PCR yield by Qubit and pooled samples together to make a single-strand DNA circle (ssDNA circle), which gave the final library. DNA nanoballs were generated with the ssDNA circle by rolling circle replication to enlarge fluorescent signals at the sequencing. DNA nanoballs were loaded into the patterned nanoarrays and pair-end reads of 100 bp were read through on the BGISEQ-500 platform for data analysis. The BGISEQ-500 platform combined the DNA nanoball-based nanoarrays and stepwise sequencing using Combinational Probe-Anchor Synthesis Sequencing Method. We generated about 5.64 Gb bases per sample on average. Average mapping ratio with reference genome was 93.47%, average mapping ratio with gene was 67.04%; 19,972 genes were identified in which 2,659 of them are novel genes. 29,781 transcripts were identified.The sequencing reads were uploaded to the Galaxy platform, and we used the usegalaxy.org server for analysis (Afgan et al., 2018). FastQC was performed to remove samples of poor quality (Andrews, 2012). All mapping was executed with the Galaxy server (v. 21.01) running Hisat2 (Galaxy Version 2.1.0 + galaxy7), FeatureCounts (Galaxy Version 2.0.1), and Deseq2 (Galaxy Version 2.11.40.6 + galaxy1) (Kim et al., 2015; Liao et al., 2014; Love et al., 2014). We employed the GRCm38 build of the reference sequence and GTF files (Ensembl) were used, iGenome (Illumina). Hisat2 was run under the following conditions: Paired-end, unstranded, default settings were used except for a GTF file was used for transcript assembly. The aligned SAM/BAM files were processed using Featurecounts (Default settings except used Ensembl GRCm38 GTF file and output for DESeq2 and gene length file). FeatureCounts files and raw files are available at GEO with accession GSE140054. The FeatureCounts compiled file is GSE140054_AllTissueFeatureCounts.txt.gz. Gene counts were normalized using DESeq2 (Love et al., 2014) followed by a regularized log transformation. Differential Expression was determined by DESeq2 using settings: Factors were tissue type, pairwise comparisons across tissues was done, output all normalized tables, size estimation was the standard median ratio, fit type was parametric, outliers were filtered using a Cook’s distance cutoff. See Table S6.Single cell RNA seq data are from Yao et al. (2021). Gene expression data matrix (matrix.csv) and cell metadata (metadata.csv) were obtained at Whole cortex & hippocampus - smart-seq (2019) with 10x-smart-seq taxonomy (2020). Data were downloaded from the Allen Institute Portal. Dataset contains RNA sequencing data of single cells from >20 areas of mouse cortex and hippocampus. The dataset includes 76,307 single cells. The sequencing results were aligned to exons and introns in the GRCm38.p3 reference genome using the STAR algorithm, and aggregated intron and exon counts at the gene level were calculated.Matrix files were processed with Delimit Pro for Windows 10/8.1/7. Data were exported as tab delimited text file and analyzed with the Qlucore Omics Explorer Version 3.6(33). Data were log2 converted and normalized to a mean of 0 and a variance of 1. 2D t-SNE plots were generated using a perplexity of 40 and default settings.
Quantification and statistical analysis
Bioinformatic analyses
Gene ontology analyses were performed with Cluego and HumanBase (Greene et al., 2015). ClueGo v2.58 run on Cytoscape v3.8.2 (Bindea et al., 2009; Shannon et al., 2003). ClueGo was run querying GO CC, REACTOME, KEGG and WikiPathways considering all evidence at a Medium Level of Network Specificity and selecting pathways with a Bonferroni corrected p value <10E-3. ClueGo was run with Go Term Fusion. HumanBase was run using default web-based parameters (Greene et al., 2015). In silico interactome data were downloaded from Genemania predicted and physical interactions and processed in Cytoscape v3.8.2 (Shannon et al., 2003). Interactome connectivity graph parameters were generated in Cytoscape. To compare ontologies between Mecp2 and Cdkl5 mutant males we used the online tool Metascape using multiple gene list and express analysis settings (Zhou et al., 2019).
Statistical analyses
Volcano plot p values were calculated using Qlucore Omics Explorer Version 3.6(33) without multiple corrections. Experiments in Figure 4D and 5A, 5B, 5D and 5F statistical analyses were performed with the engine https://www.estimationstats.com/#/ with a two-sided permutation t-test and alpha of 0.05 (Ho et al., 2019). ROC analysis and paired t-test were performed with Prism v9.2.0(283).
Authors: Beth A Rasala; Arturo V Orjalo; Zhouxin Shen; Steven Briggs; Douglass J Forbes Journal: Proc Natl Acad Sci U S A Date: 2006-11-10 Impact factor: 11.205
Authors: Frank Koopmans; Pim van Nierop; Maria Andres-Alonso; Andrea Byrnes; Tony Cijsouw; Marcelo P Coba; L Niels Cornelisse; Ryan J Farrell; Hana L Goldschmidt; Daniel P Howrigan; Natasha K Hussain; Cordelia Imig; Arthur P H de Jong; Hwajin Jung; Mahdokht Kohansalnodehi; Barbara Kramarz; Noa Lipstein; Ruth C Lovering; Harold MacGillavry; Vittoria Mariano; Huaiyu Mi; Momchil Ninov; David Osumi-Sutherland; Rainer Pielot; Karl-Heinz Smalla; Haiming Tang; Katherine Tashman; Ruud F G Toonen; Chiara Verpelli; Rita Reig-Viader; Kyoko Watanabe; Jan van Weering; Tilmann Achsel; Ghazaleh Ashrafi; Nimra Asi; Tyler C Brown; Pietro De Camilli; Marc Feuermann; Rebecca E Foulger; Pascale Gaudet; Anoushka Joglekar; Alexandros Kanellopoulos; Robert Malenka; Roger A Nicoll; Camila Pulido; Jaime de Juan-Sanz; Morgan Sheng; Thomas C Südhof; Hagen U Tilgner; Claudia Bagni; Àlex Bayés; Thomas Biederer; Nils Brose; John Jia En Chua; Daniela C Dieterich; Eckart D Gundelfinger; Casper Hoogenraad; Richard L Huganir; Reinhard Jahn; Pascal S Kaeser; Eunjoon Kim; Michael R Kreutz; Peter S McPherson; Ben M Neale; Vincent O'Connor; Danielle Posthuma; Timothy A Ryan; Carlo Sala; Guoping Feng; Steven E Hyman; Paul D Thomas; August B Smit; Matthijs Verhage Journal: Neuron Date: 2019-06-03 Impact factor: 17.173
Authors: Erik C B Johnson; Eric B Dammer; Duc M Duong; Lingyan Ping; Maotian Zhou; Luming Yin; Lenora A Higginbotham; Andrew Guajardo; Bartholomew White; Juan C Troncoso; Madhav Thambisetty; Thomas J Montine; Edward B Lee; John Q Trojanowski; Thomas G Beach; Eric M Reiman; Vahram Haroutunian; Minghui Wang; Eric Schadt; Bin Zhang; Dennis W Dickson; Nilüfer Ertekin-Taner; Todd E Golde; Vladislav A Petyuk; Philip L De Jager; David A Bennett; Thomas S Wingo; Srikant Rangaraju; Ihab Hajjar; Joshua M Shulman; James J Lah; Allan I Levey; Nicholas T Seyfried Journal: Nat Med Date: 2020-04-13 Impact factor: 53.440
Authors: Bob Olsson; Ronald Lautner; Ulf Andreasson; Annika Öhrfelt; Erik Portelius; Maria Bjerke; Mikko Hölttä; Christoffer Rosén; Caroline Olsson; Gabrielle Strobel; Elizabeth Wu; Kelly Dakin; Max Petzold; Kaj Blennow; Henrik Zetterberg Journal: Lancet Neurol Date: 2016-04-08 Impact factor: 44.182
Authors: Diana A Abbasi; Thu T A Nguyen; Deborah A Hall; Erin Robertson-Dick; Elizabeth Berry-Kravis; Stephanie M Cologna Journal: Cerebellum Date: 2022-02 Impact factor: 3.847
Authors: Lingyan Ping; Duc M Duong; Luming Yin; Marla Gearing; James J Lah; Allan I Levey; Nicholas T Seyfried Journal: Sci Data Date: 2018-03-13 Impact factor: 6.444
Authors: Maria Chahrour; Sung Yun Jung; Chad Shaw; Xiaobo Zhou; Stephen T C Wong; Jun Qin; Huda Y Zoghbi Journal: Science Date: 2008-05-30 Impact factor: 47.728
Authors: Casey S Greene; Arjun Krishnan; Aaron K Wong; Emanuela Ricciotti; Rene A Zelaya; Daniel S Himmelstein; Ran Zhang; Boris M Hartmann; Elena Zaslavsky; Stuart C Sealfon; Daniel I Chasman; Garret A FitzGerald; Kara Dolinski; Tilo Grosser; Olga G Troyanskaya Journal: Nat Genet Date: 2015-04-27 Impact factor: 38.330