Literature DB >> 34409652

Exploring reported genes of microglia RNA-sequencing data: Uses and considerations.

Thecla A van Wageningen1, Emma Gerrits2, Sara Palacin I Bonson1, Inge Huitinga3, Bart J L Eggen2, Anne-Marie van Dam1.   

Abstract

The advent of RNA-sequencing techniques has made it possible to generate large, unbiased gene expression datasets of tissues and cell types. Several studies describing gene expression data of microglia from Alzheimer's disease or multiple sclerosis have been published, aiming to generate more insight into the role of microglia in these neurological diseases. Though the raw sequencing data are often deposited in open access databases, the most accessible source of data for scientists is what is reported in published manuscripts. We observed a relatively limited overlap in reported differentially expressed genes between various microglia RNA-sequencing studies from multiple sclerosis or Alzheimer's diseases. It was clear that differences in experimental set up influenced the number of overlapping reported genes. However, even when the experimental set up was very similar, we observed that overlap in reported genes could be low. We identified that papers reporting large numbers of differentially expressed microglial genes generally showed higher overlap with other papers. In addition, though the pathology present within the tissue used for sequencing can greatly influence microglia gene expression, often the pathology present in samples used for sequencing was underreported, leaving it difficult to assess the data. Whereas reanalyzing every raw dataset could reduce the variation that contributes to the observed limited overlap in reported genes, this is not feasible for labs without (access to) bioinformatic expertise. In this study, we thus provide an overview of data present in manuscripts and their supplementary files and how these data can be interpreted.
© 2021 The Authors. GLIA published by Wiley Periodicals LLC.

Entities:  

Keywords:  Alzheimers disease; Jaccard index; RNA-seq; bioinformatics; microglia; multiple sclerosis

Mesh:

Substances:

Year:  2021        PMID: 34409652      PMCID: PMC9291850          DOI: 10.1002/glia.24078

Source DB:  PubMed          Journal:  Glia        ISSN: 0894-1491            Impact factor:   8.073


INTRODUCTION

With RNA‐sequencing (RNA‐seq), gene expression of cells and tissues can be profiled (Tang et al., 2009). In the field of neurobiology, RNA‐seq has contributed to the identification of genes specific for various subsets of glial cells (Hammond et al., 2019) and genes discriminating, for example, microglia from brain infiltrating macrophages (Bennett et al., 2016; O. Butovsky et al., 2014; Hickman et al., 2013). Furthermore, RNA‐seq of cell populations offers the possibility to identify new molecular targets relevant for disease pathology or drug development. There is ample (genetic) evidence that microglia contribute to various neurological disorders, among which Alzheimer's disease (AD) and multiple sclerosis (MS) (Butovsky & Weiner, 2018; Voet et al., 2019). RNA‐seq of microglia from these diseased conditions has identified genes involved in their pathophysiology, and the potential development of treatment for these neurological disorders. Yet, gene expression profiling of large amounts of (individual) cells leads to enormous datasets which can be challenging to untangle and interpret by inexperienced users (Koch et al., 2018). RNA‐seq data are often available on open access platforms such as NCBI GEO and ArrayExpress, but they are not always directly accessible for use, first requiring bioinformatic processing and analyses. The most accessible source of data from RNA‐seq studies is what is reported in published manuscripts and their supplementary materials. Thus, we set out to compare the reported differentially expressed (DE) microglia genes of interest for MS or for AD mouse models and human tissue compared to control tissue in addition to creating an overview of the experimental set up, techniques used and brain tissue used in these various studies. Using these data, the aim is to gain more insight into which genes are reported within current microglia RNA‐seq manuscripts, (dis)similarities between reported genes in these manuscripts and how use by other researchers could be facilitated.

METHODS

A literature search was performed using common platforms such as PubMed, Google Scholar and ScienceDirect using MESH TERMS tool for entries containing the following terms: “RNA,” “RNA‐sequencing,” “MS,” “Multiple Sclerosis,” “AD,” “Alzheimer's disease,” and “microglia.” Papers were filtered by year of publication, and papers published between 2015 and 2021 were included for analysis. Only papers reporting gene expression data from AD or MS mouse models compared to control mice, and from AD or MS human tissue compared to control tissue were included (see Tables 1 and 2).
TABLE 1

Overview of the model, experimental set up, sequencing method and data availability of AD papers (mouse and human) included in this study

StudyModelTissue used and (number of cases = n)Dissociation methodMicroglia isolation methodLibrary prep kit# microglia cells used for analysis b Reported DE genes c Data depositionData taken from

Frigerio et al. (2019)

doi: 10.1016/j.celrep.2019.03.099

Female and male C57BL/6J WT and App NL‐G‐F miceCortex and hippocampus (n = 2 per condition)EDebris Removal Solution gradient followed FACS of CD11b+ and DAPI Modified SmartSeq2 scRNA‐seq10,80125GEO NCBI accession number: GSE127893From main text and supplementary information figure s.5

Keren‐Shaul et al. (2017)

doi: 10.1016/j.cell.2017.05.018

C57/BL6‐SJL WT and 5XFAD transgenic miceWhole brains (n = 3 per condition)MPercoll gradient followed by FACS of CD45+ Massively parallel (MARS) scRNA‐seq8016149GEO NCBI accession number: GSE98971Supplementary figures—table s.1 Average Gene Expression in Immune Cell Clusters from the Brain of AD and WT Mice, Related to figure 1 (ordered the gene list upon microglia subtype III, cut off at 150)

Friedman et al. (2018)

doi: 10.1016/j.celrep.2017.12.066

Cx3cr1 GFP/+;PS2APP negative mice and Cx3cr1 GFP/+; PS2APP homozygous miceCortex (n = 3 per condition)MPercoll gradient followed by FACS for CD11b+ and DAPI+

Ovation

RNA‐Seq System V2 (NuGEN)

NA198

GEO NCBI accession number:

GSE89482

Supplementary information, Data s.4 mouse data. Adj. P values <0.05 for PS2APP + 13 mo samples.

Mathys et al. (2017)

doi: 10.1016/j.celrep.2017.09.039

CK‐p25 mice compared to control CK miceHippocampus (n = 3 per condition)ECD11b+ microbeads followed by FACS for CD11b+ and CD45+ Modified SMART‐Seq2168540GEO NCBI accession number: GSE103334

Main text, figure 3 D, E and F

Supplementary table s.4 cluster 2 versus 3,

Holtman et al. (2015)

doi: 10.1186/s40478‐015‐0203‐5

Data of aging DBA and BL6‐SJL mice and APP‐PS1 mice compared to intraperitoneal LPS injected miceWhole brain (n = not indicated in the manuscript)MPercoll gradient followed by FACS for CD11b+ and CD45+ Illumina MouseRef8 bead‐chip microarraysNA195

GEO NCBI accession number: GSE74615

GSE43366

Supplementary information—Additional file s3, mouse model specific genes

Srinivasan et al. (2016)

doi: 10.1038/ncomms11295

Female PS2APP mice compared to WT miceCortex (n = 5 per condition)

M

Percoll gradient followed by FACS for CD11b+ and DAPI+ Ovation RNA‐Seq System V2 (NuGEN)NA78

GEO NCBI accession number:

GSE75431

Main text, figure 9.A 13 m.o Tg+ microglia column

Krasemann et al. (2017)

doi: 10.1016/j.immuni.2017.08.008

APP‐PS1 mice compared to WT miceWhole brain (n = 2 per condition)MPercoll gradient followed by FACS for FCRLS+ and CD11b+ Smart‐Seq21000405GEO NCBI accession number: GSE101689Supplementary information—table s.1. Microglia Signature during Disease Progression, Related to figure 1, MG in APPPS1 disease and aging

Zhou et al. (2020)

doi: 10.1038/s41591‐019‐0695‐9

WT (C57BL/6 J) and 5XFAD (Tg6799) miceWhole brain (n = 3 per condition)MIsolation of frozen nuclei by sucroseDroplet‐based 3′ end massively parallel single‐cell RNA sequencing22511361GEO NCBI accession number: GSE140511.Supplementary table 1, cluster 0, p_value_adj <0.05

Sobue et al. (2021)

doi: 10.1186/s40478‐020‐01099‐x

AppNL‐G‐F/NL‐G‐F compared to WT miceCerebral cortex (n = 4 per condition)ERemoval of myelin using Myelin Removal Beads II, followed by incubation with magnetic beads for CD16/CD32+, CD11b+ TruSeq (Stranded) mRNANA3318NASupplementary table 1

Zhou et al. (2020) a

doi: 10.1038/s41591‐019‐0695‐9

AD patients and controlsPrefrontal cortical tissue (n = 3 per condition)MIsolation of frozen nuclei by sucroseDroplet‐based 3′ end massively parallel single‐cell RNA sequencing3986565AD Knowledge Portal (https://adknowledgeportal.org) under study snRNAseqAD_TREM2 and are also accessible through https://doi.org/10.7303/syn21125841.Supplementary table 4, micro, p_value_adj <0.05

Grubman et al. (2019) a

doi: 10.1038/s41593‐019‐0539‐4

AD patients and non‐demented controlsEntorhinal cortex (n = 6 per condition)MPurification of extracted nuclei by cell sorting using DAPI+ NextSeq 500 (Illumina)44929GEO NCBI accession number: GSE138852Supplementary table 2

Gerrits et al. (2021) a

doi: 10.1007/s00401‐021‐02263‐w

AD patients and non‐demented controlsGray matter of the occipital cortex (OC) and occipitotemporal cortex (OTC) (n = 10 AD, n = 8 control)EPurification of extracted frozen nuclei by FACS for DAPIposNEUNnegOLIG2neg 3′ single cell RNAseq from 10x Genomics150,0002898

GEO NCBI accession number:

GSE148822

Supplementary table 7, AD1 and AD2 vs. homeostatic, p_value_adj <.05.

Mathys et al. (2019) a

doi: 10.1038/s41586‐019‐1195‐2

AD patients with pathology and controls with limited to no pathologyPrefrontal cortex (n = 24 per condition)MPurification of extracted frozen nucleiNextSeq 500/550 High Output v2955110 https://www.radc.rush.edu/docs/omics.htm (snRNA‐seq PFC) or at Synapse (https://www.synapse.org/#!Synapse:syn18485175) under the doi 10.7303/syn18485175Supplementary table 2, Sheet “Mic” comparison of no‐pathology to pathology

Srinivasan et al. (2020) a

doi: 10.1016/j.celrep.2020.107843

AD patients compared to controlFrozen superior frontal gyrus and fusiform gyrus tissue blocks (n = 10 AD, n = 15 control)MFACS sorting for GFAP+, NeuN+, CD31+, CD11b+ DAPI+ Illumina HiSeq2500NA75GEO NCBI accession number: GSE125050Supplementary table 2, section GSE125050, diagnosis: AD—Control|celltype = myeloid, p value adj <0.05.

Abbreviations: APP, amyloid precursor protein; E, enzymatic; FACS, fluorescent activated cell sorting; M, mechanical; NA, not applicable; scRNA‐seq, single cell RNA sequencing; WT, wild type.

Human data.

For single‐cell sequencing studies, NA, bulk sequencing was employed.

Total of enriched and depleted genes.

TABLE 2

Overview of the model, experimental set up, sequencing method and data availability of MS papers (mouse and human) included in this study

StudyModelTissue used and (number of cases = n)Dissociation methodMicroglia isolation methodLibrary prep kit# microglia cells used for analysis b Reported DE genes b Data depositionData taken from

Mendiola et al. (2020)

doi: 10.1038/s41590‐020‐0654‐0

Healthy C57Bl/6 mice compared chronic EAE induced by MOG35–55 in C57Bl/6 miceSpinal cord (n = 2 per condition)M followed by myelin removal by magnetic beadsFACS sorting of CD11b+ Chromium Single Cell 3′ v2 Reagent kit87011041GEO NCBI accession number: GSE146295Supplementary information, Supp_table_4 selected genes from clusters MgIII, MgIV, MgV from EAE microglia.

Hammond et al. (2019)

doi: 10.1016/j.immuni.2018.11.004

Control C57BL/6J mice compared to C57BL/6J mice injected with LPCMicrodissected white matter (n = 3 per condition)MPercoll gradient followed by FACS for CD11b+, CD45+ and Cx3cr1+ Chromium single cell gene expression platform76,000196GEO NCBI accession number: GSE121654Supplementary information, table s.1 from cluster 9

Jordão et al. (2019)

doi: 10.1126/science.aat7554

Control C57BL/6N mice compared to C57Bl/6N induced with EAE (MOG35–55 EAE)Whole brain (n = 5–6 per condition)MPercoll gradient followed by FACS for CD45+ and CD11b+ mCEL‐Seq2346129GEO NCBI accession number: GSE118948Main text, from figure 3.F

Krasemann et al. (2017)

doi: 10.1016/j.immuni.2017.08.008

Control SJL/J female mice compared to SJL/J female mice induced with EAE (PLP139‐151)Spinal cord (n = 2 per condition)MPercoll gradient followed by FACS for CD11b+ Smart‐Seq21000544GEO NCBI accession number: GSE101689Supplementary information—table s.1. Microglia Signature during Disease Progression, Related to figure 1, MG in EAE disease stages.

Schirmer et al. (2019) a

doi: 10.1038/s41586‐019‐1404‐z

MS patients and non‐demented controlsNormal appearing and demyelinated subcortical WM and GM tissue (n = 12 MS, n = 6 control)Isolated nuclei from snap‐frozen tissueNo prior selection for microglia: microglial subpopulations were identified from overall scRNA‐seq data based on known microglial markers10x Genomics Single‐Cell 3′ system168325Sequence Read Archive under accession number: PRJNA544731Supplementary table 6, filtered by celltype microglia

Van der Poel et al. (2019) a

doi: 10.1038/s41467‐019‐08976‐7

MS patients and non‐demented controlsNormal appearing GM (occipital lobe) and WM (corpus callosum) (n = 15 MS, n = 16 control)EPercoll gradient followed by CD11b+ magnetic beads and FACS for CD45+, CD11b+ and CD15 NEBNext Ultra Directional RNA Library Prep Kit from IlluminaNA177GEO NCBI accession number: GSE111972Supplementary information, Supplementary table 2, pooled data from Excel sheets comparing DE MS WM vs. control WM up and down and DE MS GM vs control GM up and down.

Jäkel et al. (2019) a

doi: 10.1038/s41586‐019‐0903‐2

MS patients and non‐demented controlsMacrodissected normal appearing white matter and white matter lesions (n = 5 for MS and for control)MStrained with a 30 μm strainer followed with debris removal solutionChromium Single Cell 3′ Library and Gel Bead kit v2428163European Genome‐phenome Archive (EGA): EGAS00001003412Supplementary table 3

Abbreviations: E, enzymatic; EAE, experimental autoimmune encephalitis; GM, gray matter; LPC, lysophosphatidylcholine; M, mechanical; MOG, myelin oligodendrocyte glycoprotein; PLP1, proteolipid protein; WM, white matter.

Human studies.

Both enriched and depleted.

Overview of the model, experimental set up, sequencing method and data availability of AD papers (mouse and human) included in this study Frigerio et al. (2019) doi: 10.1016/j.celrep.2019.03.099 Keren‐Shaul et al. (2017) doi: 10.1016/j.cell.2017.05.018 Friedman et al. (2018) doi: 10.1016/j.celrep.2017.12.066 Ovation RNA‐Seq System V2 (NuGEN) GEO NCBI accession number: GSE89482 Mathys et al. (2017) doi: 10.1016/j.celrep.2017.09.039 Main text, figure 3 D, E and F Supplementary table s.4 cluster 2 versus 3, Holtman et al. (2015) doi: 10.1186/s40478‐015‐0203‐5 GEO NCBI accession number: GSE74615 GSE43366 Srinivasan et al. (2016) doi: 10.1038/ncomms11295 M GEO NCBI accession number: GSE75431 Krasemann et al. (2017) doi: 10.1016/j.immuni.2017.08.008 Zhou et al. (2020) doi: 10.1038/s41591‐019‐0695‐9 Sobue et al. (2021) doi: 10.1186/s40478‐020‐01099‐x Zhou et al. (2020) doi: 10.1038/s41591‐019‐0695‐9 Grubman et al. (2019) doi: 10.1038/s41593‐019‐0539‐4 Gerrits et al. (2021) doi: 10.1007/s00401‐021‐02263‐w GEO NCBI accession number: GSE148822 Mathys et al. (2019) doi: 10.1038/s41586‐019‐1195‐2 Srinivasan et al. (2020) doi: 10.1016/j.celrep.2020.107843 Abbreviations: APP, amyloid precursor protein; E, enzymatic; FACS, fluorescent activated cell sorting; M, mechanical; NA, not applicable; scRNA‐seq, single cell RNA sequencing; WT, wild type. Human data. For single‐cell sequencing studies, NA, bulk sequencing was employed. Total of enriched and depleted genes. Overview of the model, experimental set up, sequencing method and data availability of MS papers (mouse and human) included in this study Mendiola et al. (2020) doi: 10.1038/s41590‐020‐0654‐0 Hammond et al. (2019) doi: 10.1016/j.immuni.2018.11.004 Jordão et al. (2019) doi: 10.1126/science.aat7554 Krasemann et al. (2017) doi: 10.1016/j.immuni.2017.08.008 Schirmer et al. (2019) doi: 10.1038/s41586‐019‐1404‐z Van der Poel et al. (2019) doi: 10.1038/s41467‐019‐08976‐7 Jäkel et al. (2019) doi: 10.1038/s41586‐019‐0903‐2 Abbreviations: E, enzymatic; EAE, experimental autoimmune encephalitis; GM, gray matter; LPC, lysophosphatidylcholine; M, mechanical; MOG, myelin oligodendrocyte glycoprotein; PLP1, proteolipid protein; WM, white matter. Human studies. Both enriched and depleted. Of note, in this manuscript the term “RNA‐seq” refers to both bulk and single cell RNA‐seq of microglia subjected to differential gene expression analysis.

Construction of a core gene set from mouse model and human tissue microglia RNA‐seq papers

Per paper (for AD or MS studies) a list of reported genes was derived from the main body of the paper or from the supplementary files. A subsequent “core gene set” was compiled of genes present in at least two gene lists of the same disease. This core gene set was then compared to the each gene list prepared per paper using the function COUNT.IF in Microsoft Excel to determine gene overlap and is represented in a binary matrix indicating the presence (1) or the absence of each gene (0) reported per paper (Tables [Link], [Link]).

Commonality assessment

From these binary matrices, Jaccard indices were calculated using the jaccard score function from the sklearn_metrics package for Python (Python Software Foundation, https://www.python.org/). The Jaccard index is a measure optimized to compare the overlap between two binary lists (Chung et al., 2019) and is calculated by the number of overlapping microglia genes present in two gene lists (i.e. from two papers) and are mentioned in the core gene set divided by the total number of genes present in the two combined gene lists. It gives an indication of the similarity between binary lists where 1 indicates a complete overlap between the gene lists and 0 indicates a complete lack of overlap between the gene lists. As the Jaccard index takes into account the length of the gene lists, comparison of papers with a considerable difference in number of reported genes may result in relatively low Jaccard indices. For example, gene list A of 100 genes present in the core gene set may show complete overlap with gene list B which features 1000 genes in the core gene set, yet the Jaccard index will be relatively low as the 100 shared genes are divided by the total amount of genes which in this case is 1100 genes, leading to a jaccard index of 100/1100 = 0.09. The Jaccard index thus also takes into account the 900 genes reported in gene list B which were not reported in gene list A. A heatmap of the Jaccard indices was created in R (R Core Team, 2020) using the heatmap.2 function from the gplots package (Warnes et al., 2009).

Comparison of mouse model data to human data

In order to compare mouse and human gene expression data, a core gene set of human tissue related genes was constructed as described above (see Tables S3 and S4). Subsequently, we used the COUNT.IF function in Microsoft Excel to identify the presence or absence of a gene in the human core gene set in each mouse model paper. Homology between human and rodent genes was not assessed. Heatmaps of human genes reported in mouse model papers were created in R (R Core Team, 2020) using the heatmap.2 function from the gplots package (Warnes et al., 2009).

Pathway analysis of reported genes

Gene lists of reported genes derived from each paper were used for pathway analysis using the g:Profiler website (Raudvere et al., 2019). Commonality of the top 15 enriched GO biological pathways (GO BP) was assessed by using the COUNT.IF function in Microsoft Excel (Tables [Link], [Link]).

RESULTS

The included papers reported on RNA‐seq data of microglia from control CNS compared to diseased CNS in the main body of the text or in the supplementary data. Of these, nine papers presented data obtained from mouse models of AD (Friedman et al., 2018; Frigerio et al., 2019; Holtman et al., 2015; Keren‐Shaul et al., 2017; Krasemann et al., 2017; Mathys et al., 2017; Sobue et al., 2021; Srinivasan et al., 2016; Zhou et al., 2020) and four papers presented data on mouse models of MS (Hammond et al., 2019; Jordão et al., 2019; Krasemann et al., 2017; Mendiola et al., 2020). in addition, five papers reported on microglia RNA‐seq data from human AD tissue compared to control tissue (Gerrits et al., 2021; Grubman et al., 2019; Mathys et al., 2019; Srinivasan et al., 2020; Zhou et al., 2020) and three papers presented RNA‐seq data from human MS tissue compared to control tissue (Jäkel et al., 2019; Schirmer et al., 2019; van der Poel et al., 2019). An overview of the selected papers and used experimental procedures therein can be found in Tables 1 and 2.

Commonality of reported genes in mouse models of AD compared to control

Commonality of reported genes was assessed by calculating the Jaccard index (Chung et al., 2019) of genes reported in two or more papers (Table S1). If papers showed a complete overlap in reported genes, a value of 1 was given, if there was no overlap, a value of 0 was given. We observed very limited overlap in reported microglial DE genes between all AD mouse model papers (Figure 1(a)), with a complete lack of overlap between i.e. Mathys et al. (2017) and Zhou et al. (2020). The largest overlap was observed between Sobue et al. (2021) and Zhou et al. (2020) (Jaccard index of 0.79, Figure 1(a)) which was surprising as these two studies used different AD mouse models, had different experimental procedures and different sequencing methods (Table 1). Interestingly, though Srinivasan et al. (2016) and Friedman et al. (2018) reported on similar mouse models, brain areas studied, dissociation method and microglia isolation methods (Table 1), they only showed a Jaccard index of 0.2 (Figure 1(a)) which can be considered relatively low. Even though Zhou et al. (2020) and Sobue et al. (2021) showed a high Jaccard index between them, their Jaccard indices with other papers were relatively low (Jaccard index ranging from 0 to 0.12, Figure 1(a)). Many genes reported by, that is, Sobue et al. (2021) and Keren‐Shaul et al. (2017) were represented in the top 25 most reported microglia DE genes (Figure 1(b)) even though their Jaccard Index was only 0.1 (Figure 1(a)). Thus, whereas Sobue et al. (2021) and Keren‐Shaul et al. (2017) reported on the same DE genes, Sobue et al. (2021) reported on many other genes not reported in Keren‐Shaul et al. (2017) lowering the Jaccard index (Table S1). The top 25 most reported genes included well known genes related to AD such as Clec7a, Trem2, Apoe, and Itgax. Lastly, most papers included in this study used single‐cell sequencing methods and used about 2–3 animals per condition. Still we observed considerable variation in the number of microglia cells sequenced (ranging from ~1000 to 10,801 cells, Table 1), but we observed no direct relation between the number of cells sequenced and the number of DE microglia genes reported in a paper. Enriched pathways present within gene lists taken from all AD mouse model studies also showed considerable differences between papers. Overall, only two GO‐pathways (“response to external stimulus” and “immune system process”) were enriched in 4 out of the 9 studies included within this study whereas 79 of the total of 104 unique pathways were found enriched in one study only (Table S1). Together, the data indicate that there is little overlap in reported DE microglia genes between various AD mouse model studies which is at least related to differential experimental set ups and the number of reported microglia genes per paper. In addition, lack of overlap in enriched pathways corroborate that the lack of overlap in reported genes within each paper may influence conclusions about microglia function drawn from each paper.
FIGURE 1

Overview of the overlap in microglia DE genes reported in papers investigating mouse models for AD and MS. (a) Jaccard indices indicating overlap of reported genes of 9 included AD mouse model. (b) Heatmap of the top 25 genes reported in at least two AD mouse model papers. Red boxes indicate that the gene was reported in the paper, gray boxes indicate an absence in reporting of the gene. (c) Jaccard indices indicating the overlap of reported genes of 4 included MS mouse model. (d) Heatmap of the top 25 genes reported in at least two MS mouse model papers. Green boxes indicate that the gene was reported in the paper, gray boxes indicate an absence in reporting of the gene

Overview of the overlap in microglia DE genes reported in papers investigating mouse models for AD and MS. (a) Jaccard indices indicating overlap of reported genes of 9 included AD mouse model. (b) Heatmap of the top 25 genes reported in at least two AD mouse model papers. Red boxes indicate that the gene was reported in the paper, gray boxes indicate an absence in reporting of the gene. (c) Jaccard indices indicating the overlap of reported genes of 4 included MS mouse model. (d) Heatmap of the top 25 genes reported in at least two MS mouse model papers. Green boxes indicate that the gene was reported in the paper, gray boxes indicate an absence in reporting of the gene

Commonality of reported genes in mouse models of MS compared to control

Similar to data from AD mouse models, we observed a limited overlap in reported microglia DE genes in MS mouse model RNA‐seq papers (Figure 1(c), Table S2). Of note is that compared to AD, we found less papers describing microglia RNA‐seq data of MS mouse models making it more difficult to assess the possible cause of little overlap in reported genes. A high Jaccard index (0.69) was found between the data of Hammond et al. (2019) and Mendiola et al. (2020), even though these studies used different MS mouse models and tissues from which microglia were isolated (Table 2). Conversely, a very low Jaccard index (0.09, Figure 1(c)) was found between Mendiola et al. (2020) and Jordão et al. (2019), even though these papers used the same MS mouse model and microglia isolation methods, though they did use different tissues (Table 2). When looking at the top 25 reported microglia DE genes (including Apoe, Cxcl10, Cst7, Ccl5, Ccl4, Ccl2) in papers, we observed more overlap in often reported top 25 DE genes between Krasemann et al. (2017) and Mendiola et al. (2020) while Jordão et al. (2019) reported the least genes present in the top 25 DE genes (Figure 1(d)). The lack of overlap in DE genes reported by Jordão et al. (2019) with the other studies may partly be related to the relatively low number of DE genes reported by Jordão et al. (2019) (Table 2). Analysis of enriched GO‐pathways of gene lists showed that 2 pathways were enriched in 3 out of 4 studies (“cellular response to chemical stimulus” and “response to external stimulus”) and 9 out of 47 unique pathways found were present in 2 out of 4 studies (Table S2). Similarly to what is observed in AD mouse model papers, there is little overlap in reported DE genes and enriched pathways between MS mouse model studies which could be due to experimental set up (such as i.e. brain region studied) but is likely also influenced by differences between the papers in the numbers of reported microglia DE genes.

Commonality of reported genes data from human tissue of AD to control and to mouse model data

There was slightly more overlap in reported DE genes in human AD RNA‐seq papers than in papers on mouse models of AD (Table S3). Based on the human core gene set, we found that Gerrits et al. (2021) and Zhou et al. (2020) reported almost the exact same genes (Jaccard index = 0.97, Figure 2(a)). Additionally, there was a large overlap in genes reported by Gerrits et al. (2021) and Mathys et al. (2019) (Jaccard index = 0.7, Figure 2(a)). These three studies all reported single cell RNA‐seq data from frozen microglia nuclei though they used different tissue dissociation methods (Table 1). Concurrently, Srinivasan et al. (2020) used a different experimental set‐up and showed almost no overlap in reported DE genes by other papers (Figure 2(a); Table 1). Despite using similar research methods as Gerrits et al. (2021) and Zhou et al. (2020), Grubman et al. (2019) showed little overlap in reported DE genes with other papers (Jaccard index = 0.2, Figure 2(a)). Possibly, this could be related to the lower number of cells sequenced by Grubman et al. (2019) (449 compared to 3986 and 150,000 cells for Zhou et al. (2020) and Gerrits et al. (2021), respectively). The top 25 most reported microglia DE genes (Figure 2(b)) were dominated by the overlap in reported genes between Mathys et al. (2019), Gerrits et al. (2021) and Zhou et al. (2020). Similar to the AD mouse model data, most studies utilized single‐cell sequencing but reported a large range in the number of cells/nuclei sequenced (449–150,000, Table 1). Taken together, these data indicate slightly better overlap in reported microglia DE genes between AD human tissue studies, most probably due to the fact that several studies used very similar experimental set ups. Furthermore, GO‐pathway analysis showed that there was no pathway enriched in a majority of the studies. This could be due to the absence of pathways enriched for the gene list derived from Srinivasan et al. (2020). In total we found 43 unique enriched pathways of which 5 were present in 2 out of 5 included studies (Table S3).
FIGURE 2

Overview of microglia DE genes reported in studies using AD and MS human tissue and the comparison of mouse model data to human data. (a) Jaccard indices indicating overlap in reported genes of 5 studies included reporting data from AD human tissue. (b) Heatmap of the top 25 genes reported in at least 2 human AD tissue studies. Orange boxes indicate that the gene was reported in the paper, gray boxes indicate an absence in reporting of the gene. (c) Heatmap genes reported in at least 2 AD mouse model papers and 2 AD human tissue papers. Pink boxes indicate that the gene was reported in the paper, gray boxes indicate an absence in reporting of the gene. (d) Venn diagram of the number of genes reported by papers showing data of human MS tissue. (e) Heatmap of genes reported in at least two MS mouse model papers and at least 2 human MS tissue papers

Overview of microglia DE genes reported in studies using AD and MS human tissue and the comparison of mouse model data to human data. (a) Jaccard indices indicating overlap in reported genes of 5 studies included reporting data from AD human tissue. (b) Heatmap of the top 25 genes reported in at least 2 human AD tissue studies. Orange boxes indicate that the gene was reported in the paper, gray boxes indicate an absence in reporting of the gene. (c) Heatmap genes reported in at least 2 AD mouse model papers and 2 AD human tissue papers. Pink boxes indicate that the gene was reported in the paper, gray boxes indicate an absence in reporting of the gene. (d) Venn diagram of the number of genes reported by papers showing data of human MS tissue. (e) Heatmap of genes reported in at least two MS mouse model papers and at least 2 human MS tissue papers To assess the overlap of reported genes in AD mouse models with AD human tissue data, we compared the microglia DE genes reported by at least 2 mouse model papers with the DE genes reported in at least 2 human studies generating a list of genes mentioned in both mouse model and human tissue studies. Generally, reported genes of mouse models overlapped poorly with human tissue data. For example, the microglia DE gene most reported in mouse model papers (Clec7a) was not mentioned in at least two human RNA‐seq studies (Figures 1(c) and 2(c)). The AD mouse model paper showing the most overlap in reported microglia DE genes with AD human tissue was described by Sobue et al. (2021) (Figure 2(c)).

Commonality of reported genes data from human tissue of MS to control and to mouse model data

The three papers reporting sequencing data from human MS tissue microglia compared to control tissue showed very little overlap in reported DE genes. With only these three papers, we did not calculate Jaccard indices, instead we plotted a Venn‐diagram. Schirmer et al. (2019) and van der Poel et al. (2019) showed no overlap in reported genes (Figure 2(d) and Table S4). The reported genes found overlapping between Jäkel et al. (2019) and Schirmer et al. (2019) were ACSL1, SLC1A3, KCNQ3, PTPRJ, SYNDIG1, FKBP5, and ASAH1 (Table S4) and the reported genes overlapping between Jäkel et al. (2019) and van der Poel et al. (2019) were CXCR4, GPNMB, SPP1, SLCO2B1, CSF1R, and RHBDF2 (Table S4). The lack of overlap could be due to the very different experimental set ups such as microglia isolation and sequencing method. In addition, different brain areas were used to isolate microglia from as well as the pathological characterization of the MS tissue was unclear (Table 2). We also observed a limited number of enriched GO‐pathways with only one pathway found in two studies (“cell activation,” Table S4). It must be noted that though Schirmer et al. (2019) and Jäkel et al. (2019) feature microglia RNA‐seq data in their supplementary files, microglial gene expression was not a primary outcome of their study. Thus, it could be that their methods were not optimized to detect microglial gene expression. When comparing the genes mentioned by either van der Poel et al. (2019), Schirmer et al. (2019) or Jäkel et al. (2019) to the genes reported in at least two mouse model studies, we found only a limited number of overlapping microglia DE genes in papers reporting on mouse model data, with Jordão et al. (2019) reporting no genes reported in MS human tissue studies (Figure 2(e)).

DISCUSSION

In the present study we provide an overview of reported microglia DE genes within manuscripts and their supplementary files of current microglia RNA‐seq studies in MS or AD rodent or human tissue. We created this overview with the aim to assess which RNA‐seq data are readily available from the manuscript, how comparable they are between manuscripts and what researchers need to be aware of when using reported RNA‐seq data. Overall, we observed very limited overlap in reported microglia DE genes among various microglia RNA‐seq studies on AD or on MS. Moreover, limited overlap in enriched GO‐pathways in the reported DE genes per study indicate that different gene lists could also lead to different inferences of microglia function per study. This was especially the case for AD mouse‐model studies and for human MS tissue studies. Factors that may influence the overlap in reported genes between studies will be discussed below.

Experimental set up before sequencing

The differences in reported genes between the various currently available microglia RNA‐seq studies could for a large part be attributed to differences in experimental set‐up. Considerable overlap was observed between papers reporting DE gene data on human AD tissue with very similar experimental set up, that is, all reported single‐cell sequencing data from extracted frozen nuclei of human cortical brain areas using the same sequencing method (Gerrits et al., 2021; Grubman et al., 2019; Zhou et al., 2020). In contrast, the only study reporting bulk‐sequencing of FACS sorted microglia from human AD tissue (Srinivasan et al., 2020) showed considerably less overlap in reported genes. This indicates that gene expression data generated by RNA‐seq is likely influenced by the experimental set up being used, with alike research designs leading to more similar gene expression results (Stark et al., 2019). For mouse model studies, we observed considerably more variation in the experimental set up. For example, AD mouse model studies used either different APP‐mutation models (i.e., APP‐PS1 or AppNL‐G‐F) or used various mice models of neurodegeneration (i.e., CK‐p25 or 5XFAD). Additionally, the experimental procedures before sequencing differed between studies. These included, that is, the brain region dissected, method of microglia isolation or the library preparation kit used prior to sequencing. However strikingly, two studies reporting almost the exact same experimental set up (Friedman et al., 2018; Srinivasan et al., 2016) still showed minimal overlap in reported genes. In a similar vein, we observed that the two MS mouse model studies showing the most overlap in reported genes isolated microglia from completely different mouse models (EAE vs. LPC induced demyelination) and tissue (spinal cord vs. dissected white matter. Taken together, this suggests that whereas it is important to minimize set‐up and analysis variation in studies investigating microglia gene expression in the context of AD or MS, other factors may also be at play, contributing to the lack of overlap in reported genes.

Different requirements for reporting a gene

The large heterogeneity in reported DE genes may make it difficult to assess the use of current reported data within manuscripts for other researchers. Our results were partially influenced by the length of gene lists reported by papers. Papers reporting many different DE genes, for example, within an Excel file in the supplementary materials often showed more overlap with each other. Thus, if a paper reports only highly significantly regulated genes or genes with a high fold change, this could decrease its overlap in reported genes with other papers. This does not necessarily mean that papers reporting a lower number of genes under‐report data: What determines when a gene is reported or not could be influenced by factors such as, that is, the specific focus or aims of the paper, the research method used (i.e., bulk RNA‐seq compared to scRNA‐seq) or restrictions on the length of supplementary materials set by the journal. The selection criteria used to report genes could differ between papers. For example, only genes with a log fold change of (−) 2 or higher, only genes that showed high statistical significant DE compared to control, or various combinations of the two could be reported. To increase usability of data for other researchers, it would be advisable to include data in i.e. the supplementary files without too many restrictions. Reporting all data, already within the manuscript itself, allows other researchers to set their own filters and focus on their specific gene(s) of interest.

Pathology of animal models and human tissue

We also observed limited commonality in reported genes between papers describing mouse‐model data and data from human tissue. For example, disease associated microglia (DAM) genes were first described in an impactful paper by Keren‐Shaul et al. (2017) using a mouse model of AD. Microglia expressing high levels of DAM genes were identified as phagocytic cells responding to amyloid beta plaques, which were later also shown to be present, that is, other neurological diseases (Böttcher et al., 2019) or during neurodegeneration (Anderson et al., 2019). Since then, we have observed many references to this gene set in various (AD) microglia RNA‐seq papers. However, microglia RNA‐seq data from fresh human AD tissue did not show differential expression of these DAM genes compared to control (Alsema et al., 2020). Various explanations may be possible for this, including that post‐mortem human AD tissue is often taken from end‐stages of the diseases whereas mouse models often mimic relatively early pathological events. However, as postulated by Gerrits et al. (2021), it could also be that the isolation of amyloid plaque associated microglia from fresh human tissue may be difficult due to the location of microglia within or between the plaques affecting their isolation. Thus, whereas we did find some overlap in reporting of specific DAM related genes in human and mouse, in our study we found that there is a wealth of genes which are found in mouse models of disease but not in the human counterpart. It remains to be elucidated whether these genes indicate pathological differences between human or mouse or not. Until then, mouse model data should be considered as genes regulated in the brain featuring specific pathology, such as the formation of amyloid beta plaques inducing, that is, phagocytic DAM. Concurrently, RNA‐seq data from human AD tissue is of interest to identify microglia phenotypes present at late stages of disease, characterized by amyloid beta plaques, hyperphosphorylated tau and substantial neurodegeneration (Spires‐Jones & Hyman, 2014) which may induce a different microglia phenotype (Gerrits et al., 2021). To determine the use of the microglia RNA‐seq data presented within a published manuscript, it is therefore important that the pathology present within the tissue used is extensively described such that it is clear which microglia function or response may be reflected in the gene expression data. We observed that pathological data of especially animal models is often not extensively reported. This is perhaps most clear in the limited amount of MS microglia sequencing studies currently published. For example, it is known that the EAE mouse model of MS, described in two of the four manuscripts (Jordão et al., 2019; Mendiola et al., 2020) used in this study leads to some demyelination in white matter areas such as the corpus callosum and spinal cord white matter, but much less in gray matter areas. Yet, both studies describing data from the EAE mouse model used either the whole brain or the whole spinal cord. In addition, the inflammatory status of the demyelinated areas is often not reported. Therefore, it is important to realize that these data are indicative of the total microglia response possibly present in the EAE brain, but not of demyelinated brain areas, in which the microglia response may be different. This is substantiated by the observation that though most mouse model studies showed a lack of overlap in reported DE genes in human MS tissue studies, the genes that were overlapping were present in normal‐appearing (i.e., non‐demyelinated) brain areas of patients with MS (van der Poel et al., 2019). Surprisingly, though a role for microglia in MS pathology is quite clear (O'Loughlin et al., 2018) there are less studies reporting microglia RNA‐seq data of MS tissue compared to AD. Especially for the human tissue data, van der Poel et al. (2019) is the only study specifically focusing on microglia, Schirmer et al. (2019) and Jäkel et al. (2019) both focus on different cell types but present microglia data in their supplementary files, albeit relatively low numbers. Moreover, all three studies feature data from different brain areas. As microglia function or states can differ depending on the local environment, for example, gray versus white matter or demyelinated versus myelinated tissue (Bø et al., 2003; Geurts & Barkhof, 2008; Prins et al., 2015; Van Wageningen et al., 2019; Zrzavy et al., 2017), the lack of overlap between the three human MS RNA‐seq studies is perhaps therefore not surprising. To disentangle the role of microglia in specific aspects of MS pathology such as demyelination, inflammation or neurodegeneration, more studies are needed with similar experimental set up and reporting, as extensively as possible, the pathology present within samples used for sequencing.

Considerations

It is important to note that from our analysis we cannot make any statements about the overlap of differentially expressed genes between the entire generated datasets of several studies as this would require a bioinformatic reanalysis of all current deposited datasets. However, we can make inferences about the reported datasets within the manuscript and within supplementary data files, which is what most researchers without access to extensive bioinformatics analysis will use. Based on our observations, there is currently a large variation in which genes are reported within manuscripts, even when very similar research designs, animal models or tissues are used. Yet, we also observe that the lack of overlap in reported microglia DE genes can possibly be mitigated by standardizing the way genes are reported within the manuscript: Overall, overlap in reported genes between manuscripts is increased when many genes are reported. This can be achieved by reducing the selection criteria for genes that are reported within the manuscript or in the supplementary files. For example: the complete gene set found to be differentially expressed should be included, without extra restrictions on e.g. the minimal fold change. If this is not possible, it should be mentioned in the manuscript or supplementary files which exact threshold was used to select genes (i.e., an FDR or adjusted p value <.05 and log fold change >2). If no supplementary files are allowed, these analyzed gene sets could be added to the unanalyzed (raw) data deposited in databases such as NCBI GEO. Perhaps more importantly, as we observed that microglia RNA‐seq data are reflective of the specific pathology present within the tissue used, more extensive reporting on the specific pathology present within tissue used for sequencing (or if not possible, adjacent tissue) within the manuscript would facilitate interpretation of the specific gene sets created. This can include i.e. amount of amyloid beta plaques and hyperphosphorylated tau tangles for AD or presence or amount of inflammation and demyelination or remyelination in MS tissue. For researchers wanting to use microglia gene expression data, the above‐mentioned points are important to keep in mind when selecting genes as outcome measurements for further experiments. In conclusion, one of the main advantages of using RNA‐seq is the generation of large unbiased datasets which can be used either to elucidate novel genes regulated in specific pathological conditions or as a reference for further experiments and thus, these data can be very powerful for scientists. Through various database initiatives, the unanalyzed (raw) data in the form of count matrices or FASTA files is often available to those with knowledge of bioinformatics. However, this is not the case for all research groups. In addition, reanalyzing all datasets can be time‐consuming even for a dedicated bioinformatician. Thus, in order to facilitate and spread the knowledge generated by RNA‐seq datasets, in this study we provide insight into how current microglia gene expression data from MS and AD tissue is reported and some considerations which may be adopted to increase use of reported gene expression data for other researchers with or without bioinformatics experience.

CONFLICT OF INTEREST

The authors declare no potential conflict of interest. Table S1 Supporting Information. Click here for additional data file. Table S2 Supporting Information. Click here for additional data file. Table S3 Supporting Information. Click here for additional data file. Table S4 Supporting Information. Click here for additional data file.
  40 in total

Review 1.  Microglia in Central Nervous System Inflammation and Multiple Sclerosis Pathology.

Authors:  Sofie Voet; Marco Prinz; Geert van Loo
Journal:  Trends Mol Med       Date:  2018-12-18       Impact factor: 11.951

2.  The TREM2-APOE Pathway Drives the Transcriptional Phenotype of Dysfunctional Microglia in Neurodegenerative Diseases.

Authors:  Susanne Krasemann; Charlotte Madore; Ron Cialic; Caroline Baufeld; Narghes Calcagno; Rachid El Fatimy; Lien Beckers; Elaine O'Loughlin; Yang Xu; Zain Fanek; David J Greco; Scott T Smith; George Tweet; Zachary Humulock; Tobias Zrzavy; Patricia Conde-Sanroman; Mar Gacias; Zhiping Weng; Hao Chen; Emily Tjon; Fargol Mazaheri; Kristin Hartmann; Asaf Madi; Jason D Ulrich; Markus Glatzel; Anna Worthmann; Joerg Heeren; Bogdan Budnik; Cynthia Lemere; Tsuneya Ikezu; Frank L Heppner; Vladimir Litvak; David M Holtzman; Hans Lassmann; Howard L Weiner; Jordi Ochando; Christian Haass; Oleg Butovsky
Journal:  Immunity       Date:  2017-09-19       Impact factor: 31.745

3.  A Unique Microglia Type Associated with Restricting Development of Alzheimer's Disease.

Authors:  Hadas Keren-Shaul; Amit Spinrad; Assaf Weiner; Orit Matcovitch-Natan; Raz Dvir-Szternfeld; Tyler K Ulland; Eyal David; Kuti Baruch; David Lara-Astaiso; Beata Toth; Shalev Itzkovitz; Marco Colonna; Michal Schwartz; Ido Amit
Journal:  Cell       Date:  2017-06-08       Impact factor: 41.582

Review 4.  Pathological differences between white and grey matter multiple sclerosis lesions.

Authors:  Marloes Prins; Emma Schul; Jeroen Geurts; Paul van der Valk; Benjamin Drukarch; Anne-Marie van Dam
Journal:  Ann N Y Acad Sci       Date:  2015-07-22       Impact factor: 5.691

5.  mRNA-Seq whole-transcriptome analysis of a single cell.

Authors:  Fuchou Tang; Catalin Barbacioru; Yangzhou Wang; Ellen Nordman; Clarence Lee; Nanlan Xu; Xiaohui Wang; John Bodeau; Brian B Tuch; Asim Siddiqui; Kaiqin Lao; M Azim Surani
Journal:  Nat Methods       Date:  2009-04-06       Impact factor: 28.547

6.  Intracortical multiple sclerosis lesions are not associated with increased lymphocyte infiltration.

Authors:  L Bø; C A Vedeler; H Nyland; B D Trapp; S J Mørk
Journal:  Mult Scler       Date:  2003-08       Impact factor: 6.312

7.  New tools for studying microglia in the mouse and human CNS.

Authors:  Mariko L Bennett; F Chris Bennett; Shane A Liddelow; Bahareh Ajami; Jennifer L Zamanian; Nathaniel B Fernhoff; Sara B Mulinyawe; Christopher J Bohlen; Aykezar Adil; Andrew Tucker; Irving L Weissman; Edward F Chang; Gordon Li; Gerald A Grant; Melanie G Hayden Gephart; Ben A Barres
Journal:  Proc Natl Acad Sci U S A       Date:  2016-02-16       Impact factor: 11.205

8.  Identification of a unique TGF-β-dependent molecular and functional signature in microglia.

Authors:  Oleg Butovsky; Mark P Jedrychowski; Craig S Moore; Ron Cialic; Amanda J Lanser; Galina Gabriely; Thomas Koeglsperger; Ben Dake; Pauline M Wu; Camille E Doykan; Zain Fanek; Liping Liu; Zhuoxun Chen; Jeffrey D Rothstein; Richard M Ransohoff; Steven P Gygi; Jack P Antel; Howard L Weiner
Journal:  Nat Neurosci       Date:  2013-12-08       Impact factor: 24.884

9.  Jaccard/Tanimoto similarity test and estimation methods for biological presence-absence data.

Authors:  Neo Christopher Chung; BłaŻej Miasojedow; Michał Startek; Anna Gambin
Journal:  BMC Bioinformatics       Date:  2019-12-24       Impact factor: 3.169

10.  Human and mouse single-nucleus transcriptomics reveal TREM2-dependent and TREM2-independent cellular responses in Alzheimer's disease.

Authors:  Yingyue Zhou; Wilbur M Song; Prabhakar S Andhey; Amanda Swain; Tyler Levy; Kelly R Miller; Pietro L Poliani; Manuela Cominelli; Shikha Grover; Susan Gilfillan; Marina Cella; Tyler K Ulland; Konstantin Zaitsev; Akinori Miyashita; Takeshi Ikeuchi; Makoto Sainouchi; Akiyoshi Kakita; David A Bennett; Julie A Schneider; Michael R Nichols; Sean A Beausoleil; Jason D Ulrich; David M Holtzman; Maxim N Artyomov; Marco Colonna
Journal:  Nat Med       Date:  2020-01-13       Impact factor: 53.440

View more
  2 in total

1.  Exploring reported genes of microglia RNA-sequencing data: Uses and considerations.

Authors:  Thecla A van Wageningen; Emma Gerrits; Sara Palacin I Bonson; Inge Huitinga; Bart J L Eggen; Anne-Marie van Dam
Journal:  Glia       Date:  2021-08-18       Impact factor: 8.073

Review 2.  ApoE4 reduction: An emerging and promising therapeutic strategy for Alzheimer's disease.

Authors:  Yonghe Li; Jesse R Macyczko; Chia-Chen Liu; Guojun Bu
Journal:  Neurobiol Aging       Date:  2022-03-22       Impact factor: 5.133

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.