Literature DB >> 33665258

Differential gene expression data from the human central nervous system across Alzheimer's disease, Lewy body diseases, and the amyotrophic lateral sclerosis and frontotemporal dementia spectrum.

Ayush Noori1,2,3,4, Aziz M Mezlini2,3,4,5, Bradley T Hyman2,4,5, Alberto Serrano-Pozo2,4,5, Sudeshna Das2,3,4,5.   

Abstract

In Noori et al. [1], we hypothesized that there is a shared gene expression signature underlying neurodegenerative proteinopathies including Alzheimer's disease (AD), Lewy body diseases (LBD), and the amyotrophic lateral sclerosis and frontotemporal dementia (ALS-FTD) spectrum. To test this hypothesis, we performed a systematic review and meta-analysis of 60 human central nervous system transcriptomic datasets in the public Gene Expression Omnibus and ArrayExpress repositories, comprising a total of 2,600 AD, LBD, and ALS-FTD patients and age-matched controls which passed our stringent quality control pipeline. Here, we provide the results of differential expression analyses with data quality reports for each of these 60 datasets. This atlas of differential expression across AD, LBD, and ALS-FTD may guide future work to elucidate the pathophysiological drivers of these individual diseases as well as the common substrate of neurodegeneration.
© 2021 The Authors. Published by Elsevier Inc.

Entities:  

Keywords:  Alzheimer's disease; Amyotrophic lateral sclerosis; Differential expression; Frontotemporal dementia; Lewy body diseases; Meta-analysis; Neurodegeneration; Transcriptomics

Year:  2021        PMID: 33665258      PMCID: PMC7903289          DOI: 10.1016/j.dib.2021.106863

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications Table

Value of the Data

This dataset is a comprehensive atlas of differential gene expression data from Alzheimer's disease (AD), Lewy body diseases (LBD), and the amyotrophic lateral sclerosis and frontotemporal dementia (ALS-FTD) spectrum, providing insight into the genes and functional pathways which may drive pathogenesis uniquely in each disease, as well as collectively across all three neurodegenerative proteinopathies. Researchers may leverage this dataset to identify specific genes and pathways underlying neurodegeneration for future research endeavors such as pathophysiological studies and biomarker and therapeutic development. Future directions of this work may include comparison with single-cell and single-nuclei RNA-seq studies in postmortem specimens from patients with various neurodegenerative diseases as well as healthy subjects across the lifespan.

Data Description

Each gene expression dataset from the NCBI Gene Expression Omnibus (GEO) or EMBL-EBI ArrayExpress database, contained within individual directories in our Mendeley Data repository, was categorized by both disease and brain region and studied individually, entailing 89 separate analyses. For each analysis, the following data files are provided: Data quality report generated via the arrayQualityMetrics package [2,3]. Data quality reports include dataset metadata from GEO or ArrayExpress, inter-array distance comparisons, principal component analyses, array intensity distributions, variance mean dependence, and MA plots for individual array quality. Each report can be examined by opening the index.html file included within the appropriate subdirectory. Boxplot of normalized expression data. Boxplots indicate sample label, disease label, and outlier detection from the data quality report, as well as the 20th percentile of expression. Differential gene expression analysis. Each table lists differentially expressed genes (DEGs) along with their statistical significance, effect size, and accompanying metadata. Volcano plot of DEGs. Volcano plots represent the statistical significance (−log10 of p-value) against the effect size (log2 of fold-change).

Experimental design, Materials and Methods

All data analyses were performed in the R programming language and statistical computing environment (version 4.0.2).

Systematic review

The methodology of our systematic review of publicly available human central nervous system (CNS) gene expression datasets from AD, LBD, and ALS-FTD patients in the GEO and ArrayExpress repositories (following PRISMA guidelines [4]) is described in detail in Noori et al. [1]. Datasets were selected based on prespecified eligibility criteria. Briefly, inclusion criteria were: (1) original datasets, and (2) human microarray datasets from neuropathologically relevant CNS regions in AD, LBD, and ALS-FTD patients as well as healthy controls. This systematic review yielded 1648 control and 1586 disease samples from 60 datasets: 26 AD, 21 LBD, and 13 ALS-FTD. After the data pre-processing and quality control steps described below, a total of 2600 samples were analyzed.

Data pre-processing and analysis

Data were pre-processed and analyzed as described in Noori et al. [1]. Briefly, for each analysis, we used the Robust Multichip Average approach from the oligo package [5], [6] followed by the arrayQualityMetrics package [2], [3] to normalize expression data as needed, generate data quality reports, and detect outliers. Outliers were identified via boxplots, MA plots, and inter-array distance comparison. Samples which failed to pass any of these three outlier detection steps were discarded. The full data quality reports along with boxplots of the normalized expression data [7] are available in our Mendeley Data repository [8]. Outliers were represented by dashed lines in the boxplots. Next, probes were capped to filter for low signal, followed by surrogate variable analysis [9], [10], [11]. Finally, we performed differential expression analysis using the limma package [12], [13], [14], [15], [16] and created volcano plots of the DEGs [17]. The results of our differential expression analyses and the accompanying volcano plots are also available within our Mendeley Data repository [8].

Code availability

Source code is available on GitHub (https://github.com/ayushnoori/nd-diff-expr) and Zenodo [18].

Ethics Statement

Transcriptomics datasets included in our study were obtained from public repositories and there was no interaction with living human subjects. The downloaded transcriptomics datasets were generated from deidentified postmortem CNS tissue samples.

CRediT Author Statement

Ayush Noori: Conceptualization, Methodology, Software, Formal analysis, Writing - original draft; Aziz M. Mezlini: Methodology, Writing - review & editing; Bradley T. Hyman: Writing - review & editing, Funding acquisition; Alberto Serrano-Pozo: Conceptualization, Writing - original draft, Funding acquisition; Sudeshna Das: Supervision, Methodology, Writing - review & editing, Funding acquisition.

Funding

This work was supported by the Alzheimer's Association (AACF-17-524184 to AS-P), the National Institute on Aging (K08AG064039 to AS-P and P30AG062421 to BTH and SD), the Rainwater Charitable Foundation (to BTH), and a MassLife Sciences MassCATS award (to BTH and SD). The funding sources had no role in study design; data collection, analysis, and interpretation; or manuscript preparation.

Declaration of Competing Interest

The authors declare no competing financial interests.
SubjectMedical Sciences, Bioinformatics
Specific subject areaNeurodegeneration, Transcriptomics
Type of dataDifferential expression analyses, data quality reports, and visualizations.
How data were acquiredSystematic review of human central nervous system (CNS) transcriptomics datasets from patients with neurodegenerative diseases and healthy controls.
Data formatAnalyzed
Parameters for data collectionDatasets were selected based on prespecified inclusion and exclusion criteria.
Description of data collectionSystematic review of publicly available human gene expression datasets from neuropathologically relevant CNS regions of patients with Alzheimer's disease (AD), Lewy body diseases (LBD), and the amyotrophic lateral sclerosis and frontotemporal dementia (ALS-FTD) spectrum, as well as healthy controls. Datasets were retrieved from the NCBI Gene Expression Omnibus (GEO) and EMBL-EBI ArrayExpress repositories, followed by rigorous data pre-processing and differential expression analysis.
Data source locationMassachusetts General HospitalBoston, MassachusettsUnited StatesPrimary data sources are described below.AD Primary Data Sources: GEO: GSE109887, GSE139384, GSE118553, GSE132903, GSE131617, GSE122063, GSE106241, GSE84422, GSE33000, GSE48350, GSE29378, GSE44770, GSE36980, GSE13214, GSE26972, GSE37263, GSE32645, GSE28146, GSE26927, GSE16759, GSE15222, GSE12685, GSE6834, GSE5281, GSE1297; ArrayExpress: E-MEXP-2280LBD Primary Data Sources: GEO: GSE77666, GSE49036, GSE43490, GSE54282, GSE34516, GSE23290, GSE28894, GSE26927, GSE24378, GSE20164, GSE20163, GSE20159, GSE19587, GSE20292, GSE20291, GSE20333, GSE20146, GSE20141, GSE8397, GSE7621, GSE7307ALS-FTD Primary Data Sources: GEO: GSE139384, GSE68605, GSE56500, GSE26927, GSE20589, GSE19332, GSE18920, GSE13162, GSE4595, GSE833; ArrayExpress: E-MTAB-6189, E-MTAB-1925, E-MEXP-2280
Data accessibilityResults of differential expression analyses with data quality reports are available at our Mendeley Data repository. https://doi.org/10.17632/752nd4w7pdSource code is available on GitHub and has been archived in the Zenodo open-access repository. https://doi.org/10.5281/zenodo.4501047
Related research articleA. Noori, A.M. Mezlini, B.T. Hyman, A. Serrano-Pozo and S. Das. Systematic review and meta-analysis of human transcriptomics reveals neuroinflammation, deficient energy metabolism, and proteostasis failure across neurodegeneration. Neurobiol. Dis.149, 2021, 105225, https://doi.org/10.1016/j.nbd.2020.105225.
  14 in total

1.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data.

Authors:  Rafael A Irizarry; Bridget Hobbs; Francois Collin; Yasmin D Beazer-Barclay; Kristen J Antonellis; Uwe Scherf; Terence P Speed
Journal:  Biostatistics       Date:  2003-04       Impact factor: 5.899

2.  The sva package for removing batch effects and other unwanted variation in high-throughput experiments.

Authors:  Jeffrey T Leek; W Evan Johnson; Hilary S Parker; Andrew E Jaffe; John D Storey
Journal:  Bioinformatics       Date:  2012-01-17       Impact factor: 6.937

3.  A framework for oligonucleotide microarray preprocessing.

Authors:  Benilton S Carvalho; Rafael A Irizarry
Journal:  Bioinformatics       Date:  2010-08-05       Impact factor: 6.937

4.  Asymptotic conditional singular value decomposition for high-dimensional genomic data.

Authors:  Jeffrey T Leek
Journal:  Biometrics       Date:  2010-06-16       Impact factor: 2.571

5.  limma powers differential expression analyses for RNA-sequencing and microarray studies.

Authors:  Matthew E Ritchie; Belinda Phipson; Di Wu; Yifang Hu; Charity W Law; Wei Shi; Gordon K Smyth
Journal:  Nucleic Acids Res       Date:  2015-01-20       Impact factor: 16.971

6.  The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: explanation and elaboration.

Authors:  Alessandro Liberati; Douglas G Altman; Jennifer Tetzlaff; Cynthia Mulrow; Peter C Gøtzsche; John P A Ioannidis; Mike Clarke; P J Devereaux; Jos Kleijnen; David Moher
Journal:  BMJ       Date:  2009-07-21

7.  Detecting disease-associated genes with confounding variable adjustment and the impact on genomic meta-analysis: with application to major depressive disorder.

Authors:  Xingbin Wang; Yan Lin; Chi Song; Etienne Sibille; George C Tseng
Journal:  BMC Bioinformatics       Date:  2012-03-29       Impact factor: 3.169

8.  Entrez Gene: gene-centered information at NCBI.

Authors:  Donna Maglott; Jim Ostell; Kim D Pruitt; Tatiana Tatusova
Journal:  Nucleic Acids Res       Date:  2006-12-05       Impact factor: 16.971

Review 9.  Microarray Meta-Analysis and Cross-Platform Normalization: Integrative Genomics for Robust Biomarker Discovery.

Authors:  Christopher J Walsh; Pingzhao Hu; Jane Batt; Claudia C Dos Santos
Journal:  Microarrays (Basel)       Date:  2015-08-21

10.  Systematic review and meta-analysis of human transcriptomics reveals neuroinflammation, deficient energy metabolism, and proteostasis failure across neurodegeneration.

Authors:  Ayush Noori; Aziz M Mezlini; Bradley T Hyman; Alberto Serrano-Pozo; Sudeshna Das
Journal:  Neurobiol Dis       Date:  2020-12-19       Impact factor: 5.996

View more
  1 in total

1.  Targeting S100A4 with niclosamide attenuates inflammatory and profibrotic pathways in models of amyotrophic lateral sclerosis.

Authors:  Martina Milani; Eleonora Mammarella; Simona Rossi; Chiara Miele; Serena Lattante; Mario Sabatelli; Mauro Cozzolino; Nadia D'Ambrosi; Savina Apolloni
Journal:  J Neuroinflammation       Date:  2021-06-12       Impact factor: 8.322

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.